APSIPA Annual Summit and Conference 2026 (APSIPA ASC 2026)

Introduction

Recent advances in speech synthesis and voice conversion have significantly improved the realism of synthetic speech, raising concerns about the misuse of audio deepfakes. However, most detection benchmarks evaluate systems using clean audio recordings.

In real-world scenarios, speech distributed through media platforms rarely remains pristine. Audio uploaded to platforms such as YouTube or shared through social media platforms and messaging applications typically undergoes processing steps during editing and distribution, including resampling, codec compression, loudness normalization, trimming, and mixing with background sounds.

These transformations can significantly degrade the performance of deepfake detection systems, as subtle artifacts used by detectors may be removed or masked by media processing. As a result, models trained on clean data often fail to generalize to real-world media environments.

To address this gap, we propose the Robust Audio DeepfAke Recognition (RADAR) Challenge. Participants must detect synthetic speech after unknown media transformations have been applied to both genuine and spoofed recordings, simulating realistic media editing and distribution pipelines. This setup better reflects practical deployment scenarios where audio content has been edited, recompressed, or otherwise modified before analysis. For additive transformations such as noise or background music, we ensure that speech remains the dominant component.

Organizers

Dr. Hieu-Thi Luong

Fortemedia, Singapore
hieuthiluong@fortemedia.com

Challenge Lead Organizer

Asst. Prof. Xuechen Liu

Xi'an Jiaotong-Liverpool University, China

Co-organizer

Dr. Ivan Kukanov

KLASS Engineering and Solutions, Singapore

Co-organizer

Assoc. Prof. Kong-Aik Lee

The Hong Kong Polytechnic University, Hong Kong SAR, China

Advisor

Impact and Significance

Robustness to real-world signal processing is essential for deploying deepfake detection systems in practical environments. Audio shared through online platforms or communication channels often undergoes multiple transformations during editing, compression, and distribution, which can compromise the reliability of detection systems.

The proposed RADAR challenge aims to evaluate and improve the robustness of audio deepfake detection under such conditions. By introducing realistic media transformations into the evaluation process, the challenge better reflects real deployment scenarios and encourages the development of models that generalize beyond clean laboratory data.

Specifically, the challenge aims to:

Encourage the development of robust deepfake detection systems;
Promote research on generalization under signal degradations;
Provide a benchmark dataset and evaluation framework for studying media-robust audio deepfake detection.

The results of this challenge will establish a foundational benchmark for future research in media-robust deepfake detection.

Challenge Task

The task of the challenge is binary classification: participants must determine whether a given speech recording is bonafide (real speech) or spoof (synthetic).

Unlike traditional benchmarks, the evaluation recordings will undergo unknown media transformations, simulating realistic audio editing and distribution conditions.

The development set will contain English speech, allowing participants to design and tune their systems. The evaluation set will include speech from multiple languages, which will be announced at a later stage. This setup evaluates the ability of detection systems to generalize across both media transformations and languages.

Participants must therefore develop models that remain reliable under these realistic signal conditions. Participants must submit a score file containing one detection score per utterance. Higher scores should indicate higher confidence that the sample is spoofed.

To assist participants, a baseline system will be provided along with an inference script to demonstrate the expected submission format.

Dataset Details

Dataset Download: All datasets, baseline systems, and submission scripts are available at radar-challenge.github.io

Dataset Description

The organizers will provide two datasets for the challenge: a development set and an evaluation set.

Development Set

Contains speech that has undergone various media transformations. Allows participants to design and evaluate the robustness of their systems. Released shortly after the challenge begins; derived from the LlamaPartialSpoof full-fake subset (English) with additional transformations.

Evaluation Set

Contains speech recordings processed with unknown transformations (multiple languages including English). Ground truth labels will not be provided. Released one week before the challenge submission deadline.

Training Data Policy

No official training dataset is provided. Participants can use any public datasets, provided they comply with licenses and the data is open to all. LlamaPartialSpoof and LibriTTS (and their derivatives) are NOT allowed. All used datasets must be disclosed in the final system description.

Media Transformations

To simulate realistic communication and distribution environments, several transformations will be applied. An utterance may undergo one or more transformations.

Signal level operations:

Peak level adjustment
Fade-in / fade-out

Signal structure modifications:

Silence trimming
Zero padding

Environmental conditions:

Additive environmental noise
Background music
Room impulse response (RIR) convolution

Media channel effects:

Audio codec compression
Resampling
Dynamic range compression
Bandwidth limitation

Some transformations used in the evaluation set will not appear in the development set to evaluate generalization capability. The combination is randomly sampled to simulate diverse real-world pipelines. Additional undisclosed transformations with similar characteristics may also be applied to the evaluation set.

Evaluation Metrics

The primary evaluation metric will be the Equal Error Rate (EER), which is widely used in speaker verification and spoof detection tasks. Leaderboard rankings will be determined based on EER performance on the evaluation set.

Participants will submit detection scores for each evaluation utterance. The final Equal Error Rate (EER) will be computed by the organizers based on the submitted scores.

No secondary metrics will be used for ranking, but some other metrics may also be reported for analysis.

Challenge announcement:	March 15, 2026
Development data release:	March 25, 2026
Evaluation data release:	April 15, 2026
Result submission deadline:	April 25, 2026
Paper submission:	June 15, 2026
Notification of paper acceptance:	July 31, 2026
Camera-ready GC paper submission:	August 15, 2026
APSIPA conference presentation:	November 9-12, 2026

Grand Challenge

APSIPA RADAR Challenge: Robust Audio Deepfake Recognition under Media Transformations

Official Challenge Website