For datasets, baseline systems, submission guidelines, and the latest updates, please visit:
radar-challenge.github.ioRecent advances in speech synthesis and voice conversion have significantly improved the realism of synthetic speech, raising concerns about the misuse of audio deepfakes. However, most detection benchmarks evaluate systems using clean audio recordings.
In real-world scenarios, speech distributed through media platforms rarely remains pristine. Audio uploaded to platforms such as YouTube or shared through social media platforms and messaging applications typically undergoes processing steps during editing and distribution, including resampling, codec compression, loudness normalization, trimming, and mixing with background sounds.
These transformations can significantly degrade the performance of deepfake detection systems, as subtle artifacts used by detectors may be removed or masked by media processing. As a result, models trained on clean data often fail to generalize to real-world media environments.
To address this gap, we propose the Robust Audio DeepfAke Recognition (RADAR) Challenge. Participants must detect synthetic speech after unknown media transformations have been applied to both genuine and spoofed recordings, simulating realistic media editing and distribution pipelines. This setup better reflects practical deployment scenarios where audio content has been edited, recompressed, or otherwise modified before analysis. For additive transformations such as noise or background music, we ensure that speech remains the dominant component.
Robustness to real-world signal processing is essential for deploying deepfake detection systems in practical environments. Audio shared through online platforms or communication channels often undergoes multiple transformations during editing, compression, and distribution, which can compromise the reliability of detection systems.
The proposed RADAR challenge aims to evaluate and improve the robustness of audio deepfake detection under such conditions. By introducing realistic media transformations into the evaluation process, the challenge better reflects real deployment scenarios and encourages the development of models that generalize beyond clean laboratory data.
Specifically, the challenge aims to:
The results of this challenge will establish a foundational benchmark for future research in media-robust deepfake detection.
The task of the challenge is binary classification: participants must determine whether a given speech recording is bonafide (real speech) or spoof (synthetic).
Unlike traditional benchmarks, the evaluation recordings will undergo unknown media transformations, simulating realistic audio editing and distribution conditions.
The development set will contain English speech, allowing participants to design and tune their systems. The evaluation set will include speech from multiple languages, which will be announced at a later stage. This setup evaluates the ability of detection systems to generalize across both media transformations and languages.
Participants must therefore develop models that remain reliable under these realistic signal conditions. Participants must submit a score file containing one detection score per utterance. Higher scores should indicate higher confidence that the sample is spoofed.
To assist participants, a baseline system will be provided along with an inference script to demonstrate the expected submission format.
Dataset Download: All datasets, baseline systems, and submission scripts are available at radar-challenge.github.io
The organizers will provide two datasets for the challenge: a development set and an evaluation set.
Contains speech that has undergone various media transformations. Allows participants to design and evaluate the robustness of their systems. Released shortly after the challenge begins; derived from the LlamaPartialSpoof full-fake subset (English) with additional transformations.
Contains speech recordings processed with unknown transformations (multiple languages including English). Ground truth labels will not be provided. Released one week before the challenge submission deadline.
No official training dataset is provided. Participants can use any public datasets, provided they comply with licenses and the data is open to all. LlamaPartialSpoof and LibriTTS (and their derivatives) are NOT allowed. All used datasets must be disclosed in the final system description.
To simulate realistic communication and distribution environments, several transformations will be applied. An utterance may undergo one or more transformations.
Some transformations used in the evaluation set will not appear in the development set to evaluate generalization capability. The combination is randomly sampled to simulate diverse real-world pipelines. Additional undisclosed transformations with similar characteristics may also be applied to the evaluation set.
The primary evaluation metric will be the Equal Error Rate (EER), which is widely used in speaker verification and spoof detection tasks. Leaderboard rankings will be determined based on EER performance on the evaluation set.
Participants will submit detection scores for each evaluation utterance. The final Equal Error Rate (EER) will be computed by the organizers based on the submitted scores.
No secondary metrics will be used for ranking, but some other metrics may also be reported for analysis.
| Challenge announcement: | March 15, 2026 |
| Development data release: | March 25, 2026 |
| Evaluation data release: | April 15, 2026 |
| Result submission deadline: | April 25, 2026 |
| Paper submission: | May 15, 2026 |
| Notification of paper acceptance: | July 15, 2026 |
| Camera-ready GC paper submission: | July 31, 2026 |
| APSIPA conference presentation: | November 9-12, 2026 |