Select a Winter School:

School 1: Trustworthy Speech Foundation Models and LLM-Augmented Modeling for Pathological Speech School 2: Foundation Models and Agentic AI

Trustworthy Speech Foundation Models and LLM-Augmented Modeling for Pathological Speech

Introduction

Recent advances in foundation models have significantly reshaped speech processing. Large-scale self-supervised speech models and pre-trained automatic speech recognition systems now provide robust representations that can be adapted to downstream tasks with limited labeled data. At the same time, large language models (LLMs) are increasingly being used not as standalone text systems, but as components that strengthen speech pipelines through transcript refinement, contextual decoding, error correction, semantic post-processing, spoken-language interpretation, and interaction support. This combination is particularly promising for pathological speech, where acoustic variability, linguistic atypicality, data scarcity, and high-stakes use conditions expose the limitations of standard speech technologies.

Pathological speech includes speech affected by neurological, motor, developmental, or cognitive conditions, such as dysarthria, aphasia, stuttering, Parkinsonian speech, and speech changes associated with neurodegenerative disorders. From a signal and information processing perspective, this domain is highly challenging: speech signals are often atypical and heterogeneous, available datasets are small and imbalanced, annotations may be noisy, and clinically meaningful errors are not always reflected by standard benchmark metrics. These properties make pathological speech an important and realistic testbed for studying robustness, adaptation, uncertainty, fairness, and interpretability in foundation-model-based speech systems.

This Winter School will provide a focused introduction to trustworthy speech foundation models and LLM-augmented speech modeling for pathological speech. The school will cover the signal characteristics of pathological speech, speech foundation models and their adaptation strategies, and the use of LLMs to enhance speech systems. Particular emphasis will be placed on how LLMs can support decoding, rescoring, transcript repair, semantic error recovery, contextual interpretation, and downstream assistance in clinical and assistive settings. The program will also address trustworthiness issues including robustness, uncertainty, interpretability, fairness, and responsible deployment in healthcare-oriented speech technologies.

Learning Objectives

By the end of the Winter School, participants will be able to:

Explain the signal-processing and machine-learning challenges posed by pathological speech.
Describe how self-supervised and pre-trained speech foundation models can be adapted to atypical speech.
Understand how LLMs can be used to improve speech pipelines through contextual modeling, transcript refinement, rescoring, and semantic interpretation.
Identify trustworthiness challenges in pathological speech technology, including robustness, fairness, uncertainty, interpretability, and domain shift.
Evaluate practical modeling strategies for limited, noisy, heterogeneous, and clinically realistic datasets.
Recognize open research questions in combining speech foundation models and LLMs for pathological speech applications.

Tentative Program

Time	Session Title	Content
09:00–09:45	Session 1. Pathological Speech as a Challenging Domain for Foundation Models	Overview of pathological speech types and their acoustic, prosodic, articulatory, and linguistic variability; downstream tasks such as ASR, assessment, monitoring, and assistive communication; dataset scarcity, annotation noise, and evaluation issues.
09:45–10:30	Session 2. Speech Foundation Models for Pathological Speech	Self-supervised learning, large pre-trained speech encoders, transfer learning, fine-tuning, parameter-efficient adaptation, and domain mismatch between mainstream speech corpora and pathological speech.
10:30–10:45	Break	Coffee/tea break.
10:45–11:25	Session 3. Using LLMs to Enhance Speech Models for Pathological Speech	LLMs for ASR rescoring, transcript correction, semantic repair, context-aware decoding, spoken-language interpretation, instruction-based post-processing, and downstream decision support; opportunities and limits of LLMs in speech pipelines.
11:25–12:00	Panel Discussion. From Benchmark Performance to Clinical Trustworthiness in Speech+LLM Systems for Pathological Speech	Moderated discussion among experts on robustness, interpretability, uncertainty, fairness, clinically meaningful evaluation, and deployment challenges in pathological speech technology.

Who Should Attend

This Winter School is intended for:

PhD students and research master's students in speech processing, signal processing, machine learning, language technology, healthcare AI, or related fields;
Early-career researchers interested in foundation models for speech and human-centered AI;
Practitioners seeking a structured introduction to pathological speech technology and trustworthy speech+LLM systems.

Organizers and Instructors

Zhengjun Yue

Shenzhen Loop Area Institute
Chinese University of Hong Kong, China

Zhengjun Yue is an Assistant Professor at the Center for Language, Intelligence and Machines, Shenzhen Loop Area Institute, Adjunct Assistant Professor at The Chinese University of Hong Kong, Shenzhen, and a Visiting Researcher at King's College London. She received her PhD from the University of Sheffield as a Marie Curie Fellow, and previously worked as an EPSRC-funded postdoctoral researcher at King's College London and as an Assistant Professor at Delft University of Technology. Her research focuses on inclusive speech technologies for healthcare and wellbeing, including pathological speech, speech processing for children and older adults, cognitive impairment analysis, low-resource and underserved populations, assistive interaction, multimodal biomarkers, and explainable and generative AI based on large language models and speech foundation models. She has led and contributed to international projects funded by EU Horizon 2020 and NWO, and serves on the boards of ISCA SIG-SLPAT and SIG-CHILD, while also taking organizational roles in major international conferences such as INTERSPEECH 2025, IEEE SLT 2024, and CISS 2025.

Zhaojie Luo

Shenzhen Loop Area Institute
Southeast University, China

Zhaojie Luo, Ph.D., is Associate Professor and Deputy Director of the Department of Brain and Learning Sciences at Southeast University, and a Huawei Zijin Young Scholar. He received his M.S. and Ph.D. degrees from Kobe University, Japan, and previously served as a Research Fellow at the National University of Singapore and as an Assistant Professor at Osaka University (2020–2024). He has published over 40 papers in top-tier journals and conferences, including IEEE/ACM TASLP, IEEE TMM, and IEEE TCSVT. In the past two years, he has led one JSPS project, one NSFC General Program, and one National Key Laboratory Young Scientist project.

Paula A. Perez Toro

FAU Erlangen-Nürnberg, Germany

Paula A. Perez Toro is Group Leader at FAU Erlangen–Nürnberg (Germany). Her research focuses on multilingual modeling of pathological speech at the intersection of speech technology, clinical neuroscience, and machine learning. She served as a lecturer in the Department of Electronics and Telecommunications Engineering at the University of Antioquia from January 2019 to July 2024, and was the Activity Leader/Coordinator of the EIT Health M.Sc. in HMDA in 2022. She is currently affiliated with Friedrich-Alexander-Universität Erlangen-Nürnberg, where she continues academic and research activities in speech and language technologies.

Foundation Models and Agentic AI

Theme: Beyond the Prompt: Security and Applications in Multimodal Models and Agentic Ecosystems

Introduction

The artificial intelligence landscape is undergoing a radical paradigm shift, moving from stateless, text-based prompts to natively multimodal, active, and highly autonomous systems. This half-day Winter School session offers a comprehensive exploration of the next generation of AI ecosystems. The program bridges the critical transition from Large Language Models (LLMs) and Vision-Language Models (VLMs) capable of complex cross-modal reasoning to Agentic AI operating systems designed for long-term planning, tool execution, and persistent memory.

Participants will navigate the cutting-edge applications of these technologies–spanning industrial robotics to fully automated scientific pipelines like AutoResearchClaw–while rigorously confronting the unprecedented security and ethical challenges they introduce. The curriculum dives deep into the expanded attack surfaces of foundational models (including cross-modal prompt injection, adversarial jailbreaks, and data poisoning) and the institutional threats of autonomous agents, such as the academic "paper mill" transparency crisis.

Crucially, this session bridges academic theory with massive-scale enterprise deployment. Featuring a dedicated spotlight on sovereign foundational AI by SoftBank's SB Intuitions (Japan), the program culminates in a capstone business case study by Masan Group (Vietnam). Attendees will explore how multi-agent systems are actively revolutionizing national-scale supply chains, autonomous store management, and personalized retail discovery, equipping researchers and practitioners with the blueprint to build, secure, and govern the autonomous systems of tomorrow.

Learning Objectives

By the end of this winter school, participants will be able to:

Comprehend the architectural principles and training paradigms underlying modern multimodal AI systems, including vision-language and multimodal large language models.
Analyze real-world industrial applications of multimodal AI, with an emphasis on deployment contexts, system integration, and practical constraints.
Identify and systematically categorize security threats and defense mechanisms in multimodal AI, including adversarial attacks and robustness strategies.
Understand the design and operational principles of agentic AI ecosystems, including tool use, planning, memory, and interaction with external environments.
Evaluate current industrial practices for developing trustworthy multimodal systems within increasingly autonomous and agentic settings, including safety, reliability, and governance considerations.
Apply theoretical concepts through hands-on demonstrations and case studies, enabling practical understanding of vulnerabilities and mitigation techniques in multimodal systems.

Tentative Program

Duration	Session Title	Content
Session 1: Multimodal Large Language Models (90 Minutes)
45 Min	Part 1: Application of Multimodal AI	Architectural evolution, industry applications in healthcare and robotics, and generative synthesis using multimodal models. Speaker: Thanh Duc Ngo
45 Min	Part 2: Foundations of Multimodal AI Security	Threat landscape of VLMs/MLLMs, training-time and runtime defenses, and live attack demonstration. Speaker: April Pyone Maung Maung
Session 2: Agentic AI (90 Minutes)
15 Min	Part 1: The Dawn of Agentic Ecosystems	Architectural evolution from stateless LLMs to stateful Agentic Operating Systems, core capabilities, and industry disruption. Speaker: Huy H. Nguyen
15 Min	Part 2: State-of-the-Art Case Studies in Action	Deep dive into OpenClaw architecture, long-horizon task execution, and AutoResearchClaw autonomous pipelines. Speaker: Huy H. Nguyen
15 Min	Part 3: Security, Governance, and Human-in-the-Loop	The autonomy security paradox, academic publishing transparency crisis, epistemological trust, and transitioning to Human-on-the-Loop. Speaker: Huy H. Nguyen
15 Min	Part 4: SoftBank & SB Intuitions – Pioneering Sovereign AI	SoftBank's "Beyond Carrier" AI vision, AI-RAN revolution, and SB Intuitions' mission for sovereign AI in Japan. Speaker: Koki Wataoka
30 Min	Part 5: Enterprise Agentic AI in Action - Masan Group	Massive-scale application of Agentic AI. Multi-agent retail ecosystem, autonomous store management, and intelligent product discovery. Speaker: Duc Chau

Who Should Attend

This Winter School is intended for:

Graduate students (Master's and PhD) in computer science, cybersecurity, machine learning, or related fields who are looking to build a research foundation in AI security.
Early-career researchers and postdocs who work with multimodal models or language model applications and want to understand the security implications of their work.
Industry practitioners who are beginning to encounter AI-specific threats in their organizations and need a structured introduction to the threat landscape.

List of Instructors

Huy H. Nguyen

SB Intuitions, Japan

Dr. Huy H. Nguyen is a researcher at SB Intuitions. His research focuses on improving the safety, security, and privacy of LLMs and VLMs, as well as the generation and detection of synthetic media. His future research vision includes extending these efforts to safeguard artificial general intelligence (AGI). He earned his Ph.D. from SOKENDAI in collaboration with NII in 2022.

April Pyone Maung Maung

National Institute of Informatics, Japan

Dr. April Pyone Maung Maung is a researcher specializing in machine learning and information security, with a focus on adversarial machine learning. He is currently a Project Assistant Professor at NII. He received his Ph.D. from Tokyo Metropolitan University. His research explores the robustness and security of modern machine learning systems against adversarial threats. Dr. Maung Maung is a recipient of the IEEE ICCE-TW Best Paper Award (2016).

Thanh Duc Ngo

University of Information Technology - VNUHCM, Vietnam

Dr. Thanh Duc Ngo is a researcher and academic leader specializing in computer vision, multimedia information retrieval, and video analysis. He currently serves as Dean of the Faculty of Computer Science and Head of the Multimedia Computing Department at UIT, VNUHCM. He received his Ph.D. from SOKENDAI in collaboration with NII. Dr. Ngo has an extensive publication record spanning visual understanding and large-scale multimedia analysis.

Koki Wataoka

SB Intuitions, Japan

Koki Wataoka leads the Responsible AI Team in the Data & Safety Department of the R&D Headquarters at SB Intuitions, where he oversees research and development to advance the safety of LLMs and VLMs. He earned his master's degree from Kobe University in 2021. He previously worked at LINE Corporation focusing on the safety of large-scale language models, and moved to SB Intuitions in 2023.

Duc Chau

Masan Group, Vietnam

Dr. Duc Chau is an AI expert and industry research leader specializing in deep learning, signal processing, and agentic AI. He serves as Director of AI at Masan Group, leading the deployment of multi-agent AI systems across retail and supply chain ecosystems. His expertise spans speech technologies and document image analysis. Previously, he was Head of Research at Cinnamon AI and Head of R&D at Zalo AI. Dr. Chau received his Ph.D. in Computer and Information Sciences from JAIST.

Winter Schools

Trustworthy Speech Foundation Models and LLM-Augmented Modeling for Pathological Speech

Zhengjun Yue

Zhaojie Luo

Paula A. Perez Toro

Foundation Models and Agentic AI

Theme: Beyond the Prompt: Security and Applications in Multimodal Models and Agentic Ecosystems

Huy H. Nguyen

April Pyone Maung Maung

Thanh Duc Ngo

Koki Wataoka

Duc Chau