Select a Winter School:
Recent advances in foundation models have significantly reshaped speech processing. Large-scale self-supervised speech models and pre-trained automatic speech recognition systems now provide robust representations that can be adapted to downstream tasks with limited labeled data. At the same time, large language models (LLMs) are increasingly being used not as standalone text systems, but as components that strengthen speech pipelines through transcript refinement, contextual decoding, error correction, semantic post-processing, spoken-language interpretation, and interaction support. This combination is particularly promising for pathological speech, where acoustic variability, linguistic atypicality, data scarcity, and high-stakes use conditions expose the limitations of standard speech technologies.
Pathological speech includes speech affected by neurological, motor, developmental, or cognitive conditions, such as dysarthria, aphasia, stuttering, Parkinsonian speech, and speech changes associated with neurodegenerative disorders. From a signal and information processing perspective, this domain is highly challenging: speech signals are often atypical and heterogeneous, available datasets are small and imbalanced, annotations may be noisy, and clinically meaningful errors are not always reflected by standard benchmark metrics. These properties make pathological speech an important and realistic testbed for studying robustness, adaptation, uncertainty, fairness, and interpretability in foundation-model-based speech systems.
This Winter School will provide a focused introduction to trustworthy speech foundation models and LLM-augmented speech modeling for pathological speech. The school will cover the signal characteristics of pathological speech, speech foundation models and their adaptation strategies, and the use of LLMs to enhance speech systems. Particular emphasis will be placed on how LLMs can support decoding, rescoring, transcript repair, semantic error recovery, contextual interpretation, and downstream assistance in clinical and assistive settings. The program will also address trustworthiness issues including robustness, uncertainty, interpretability, fairness, and responsible deployment in healthcare-oriented speech technologies.
By the end of the Winter School, participants will be able to:
| Time | Session Title | Content |
|---|---|---|
| 09:00–09:45 | Session 1. Pathological Speech as a Challenging Domain for Foundation Models | Overview of pathological speech types and their acoustic, prosodic, articulatory, and linguistic variability; downstream tasks such as ASR, assessment, monitoring, and assistive communication; dataset scarcity, annotation noise, and evaluation issues. |
| 09:45–10:30 | Session 2. Speech Foundation Models for Pathological Speech | Self-supervised learning, large pre-trained speech encoders, transfer learning, fine-tuning, parameter-efficient adaptation, and domain mismatch between mainstream speech corpora and pathological speech. |
| 10:30–10:45 | Break | Coffee/tea break. |
| 10:45–11:25 | Session 3. Using LLMs to Enhance Speech Models for Pathological Speech | LLMs for ASR rescoring, transcript correction, semantic repair, context-aware decoding, spoken-language interpretation, instruction-based post-processing, and downstream decision support; opportunities and limits of LLMs in speech pipelines. |
| 11:25–12:00 | Panel Discussion. From Benchmark Performance to Clinical Trustworthiness in Speech+LLM Systems for Pathological Speech | Moderated discussion among experts on robustness, interpretability, uncertainty, fairness, clinically meaningful evaluation, and deployment challenges in pathological speech technology. |
This Winter School is intended for:
Shenzhen Loop Area Institute
Chinese University of Hong Kong, China
Zhengjun Yue is an Assistant Professor at the Center for Language, Intelligence and Machines, Shenzhen Loop Area Institute, Adjunct Assistant Professor at The Chinese University of Hong Kong, Shenzhen, and a Visiting Researcher at King's College London. She received her PhD from the University of Sheffield as a Marie Curie Fellow, and previously worked as an EPSRC-funded postdoctoral researcher at King's College London and as an Assistant Professor at Delft University of Technology. Her research focuses on inclusive speech technologies for healthcare and wellbeing, including pathological speech, speech processing for children and older adults, cognitive impairment analysis, low-resource and underserved populations, assistive interaction, multimodal biomarkers, and explainable and generative AI based on large language models and speech foundation models. She has led and contributed to international projects funded by EU Horizon 2020 and NWO, and serves on the boards of ISCA SIG-SLPAT and SIG-CHILD, while also taking organizational roles in major international conferences such as INTERSPEECH 2025, IEEE SLT 2024, and CISS 2025.
Shenzhen Loop Area Institute
Southeast University, China
Zhaojie Luo, Ph.D., is Associate Professor and Deputy Director of the Department of Brain and Learning Sciences at Southeast University, and a Huawei Zijin Young Scholar. He received his M.S. and Ph.D. degrees from Kobe University, Japan, and previously served as a Research Fellow at the National University of Singapore and as an Assistant Professor at Osaka University (2020–2024). He has published over 40 papers in top-tier journals and conferences, including IEEE/ACM TASLP, IEEE TMM, and IEEE TCSVT. In the past two years, he has led one JSPS project, one NSFC General Program, and one National Key Laboratory Young Scientist project.
FAU Erlangen-Nürnberg, Germany
Paula A. Perez Toro is Group Leader at FAU Erlangen–Nürnberg (Germany). Her research focuses on multilingual modeling of pathological speech at the intersection of speech technology, clinical neuroscience, and machine learning. She served as a lecturer in the Department of Electronics and Telecommunications Engineering at the University of Antioquia from January 2019 to July 2024, and was the Activity Leader/Coordinator of the EIT Health M.Sc. in HMDA in 2022. She is currently affiliated with Friedrich-Alexander-Universität Erlangen-Nürnberg, where she continues academic and research activities in speech and language technologies.
The artificial intelligence landscape is undergoing a radical paradigm shift, moving from stateless, text-based prompts to natively multimodal, active, and highly autonomous systems. This half-day Winter School session offers a comprehensive exploration of the next generation of AI ecosystems. The program bridges the critical transition from Large Language Models (LLMs) and Vision-Language Models (VLMs) capable of complex cross-modal reasoning to Agentic AI operating systems designed for long-term planning, tool execution, and persistent memory.
Participants will navigate the cutting-edge applications of these technologies–spanning industrial robotics to fully automated scientific pipelines like AutoResearchClaw–while rigorously confronting the unprecedented security and ethical challenges they introduce. The curriculum dives deep into the expanded attack surfaces of foundational models (including cross-modal prompt injection, adversarial jailbreaks, and data poisoning) and the institutional threats of autonomous agents, such as the academic "paper mill" transparency crisis.
Crucially, this session bridges academic theory with massive-scale enterprise deployment. Featuring a dedicated spotlight on sovereign foundational AI by SoftBank's SB Intuitions (Japan), the program culminates in a capstone business case study by Masan Group (Vietnam). Attendees will explore how multi-agent systems are actively revolutionizing national-scale supply chains, autonomous store management, and personalized retail discovery, equipping researchers and practitioners with the blueprint to build, secure, and govern the autonomous systems of tomorrow.
By the end of this winter school, participants will be able to:
| Duration | Session Title | Content |
|---|---|---|
| Session 1: Multimodal Large Language Models (90 Minutes) | ||
| 45 Min | Part 1: Application of Multimodal AI | Architectural evolution, industry applications in healthcare and robotics, and generative synthesis using multimodal models. Speaker: Thanh Duc Ngo |
| 45 Min | Part 2: Foundations of Multimodal AI Security | Threat landscape of VLMs/MLLMs, training-time and runtime defenses, and live attack demonstration. Speaker: April Pyone Maung Maung |
| Session 2: Agentic AI (90 Minutes) | ||
| 15 Min | Part 1: The Dawn of Agentic Ecosystems | Architectural evolution from stateless LLMs to stateful Agentic Operating Systems, core capabilities, and industry disruption. Speaker: Huy H. Nguyen |
| 15 Min | Part 2: State-of-the-Art Case Studies in Action | Deep dive into OpenClaw architecture, long-horizon task execution, and AutoResearchClaw autonomous pipelines. Speaker: Huy H. Nguyen |
| 15 Min | Part 3: Security, Governance, and Human-in-the-Loop | The autonomy security paradox, academic publishing transparency crisis, epistemological trust, and transitioning to Human-on-the-Loop. Speaker: Huy H. Nguyen |
| 15 Min | Part 4: SoftBank & SB Intuitions – Pioneering Sovereign AI | SoftBank's "Beyond Carrier" AI vision, AI-RAN revolution, and SB Intuitions' mission for sovereign AI in Japan. Speaker: Koki Wataoka |
| 30 Min | Part 5: Enterprise Agentic AI in Action - Masan Group | Massive-scale application of Agentic AI. Multi-agent retail ecosystem, autonomous store management, and intelligent product discovery. Speaker: Duc Chau |
This Winter School is intended for:
SB Intuitions, Japan
Dr. Huy H. Nguyen is a researcher at SB Intuitions. His research focuses on improving the safety, security, and privacy of LLMs and VLMs, as well as the generation and detection of synthetic media. His future research vision includes extending these efforts to safeguard artificial general intelligence (AGI). He earned his Ph.D. from SOKENDAI in collaboration with NII in 2022.
National Institute of Informatics, Japan
Dr. April Pyone Maung Maung is a researcher specializing in machine learning and information security, with a focus on adversarial machine learning. He is currently a Project Assistant Professor at NII. He received his Ph.D. from Tokyo Metropolitan University. His research explores the robustness and security of modern machine learning systems against adversarial threats. Dr. Maung Maung is a recipient of the IEEE ICCE-TW Best Paper Award (2016).
University of Information Technology - VNUHCM, Vietnam
Dr. Thanh Duc Ngo is a researcher and academic leader specializing in computer vision, multimedia information retrieval, and video analysis. He currently serves as Dean of the Faculty of Computer Science and Head of the Multimedia Computing Department at UIT, VNUHCM. He received his Ph.D. from SOKENDAI in collaboration with NII. Dr. Ngo has an extensive publication record spanning visual understanding and large-scale multimedia analysis.
SB Intuitions, Japan
Koki Wataoka leads the Responsible AI Team in the Data & Safety Department of the R&D Headquarters at SB Intuitions, where he oversees research and development to advance the safety of LLMs and VLMs. He earned his master's degree from Kobe University in 2021. He previously worked at LINE Corporation focusing on the safety of large-scale language models, and moved to SB Intuitions in 2023.
Masan Group, Vietnam
Dr. Duc Chau is an AI expert and industry research leader specializing in deep learning, signal processing, and agentic AI. He serves as Director of AI at Masan Group, leading the deployment of multi-agent AI systems across retail and supply chain ecosystems. His expertise spans speech technologies and document image analysis. Previously, he was Head of Research at Cinnamon AI and Head of R&D at Zalo AI. Dr. Chau received his Ph.D. in Computer and Information Sciences from JAIST.