Language model security stands at a critical juncture. Despite extensive work on adversarial attacks and basic defenses, we still lack a deep understanding of the principles that drive these vulnerabilities: the mathematical and computational properties that create them, how model internals process adversarial inputs, and whether current evaluations capture real-world security risks.
This workshop brings together researchers in adversarial robustness, conversational and sociotechnical AI safety, and broader LLM security to move beyond surface-level observations—probing the mechanisms behind vulnerabilities and charting a path toward genuinely secure architectures.
Emphasizing foundational understanding over incremental improvements, we ask:
Our goal is to catalyze rigorous, cross-disciplinary discussion that advances the theoretical, empirical, and evaluative foundations of language model security.
The workshop consists of four thematic blocks. Each block includes an expert keynote (45 minutes), two contributed talks (15 minutes), and an extended guided discussion (45 minutes) among participants, presenters, and domain experts. Our format prioritizes deep engagement and discussion over talk density.
We invite short contributed talks that advance the foundations of language model security. We are especially interested in work that clarifies the mathematical and computational properties underlying vulnerabilities, sheds light on how model internals process adversarial inputs, and proposes evaluation frameworks that better capture real-world security risks.
Submissions will be assigned to thematic blocks by organizers. Each submission receives reviews from three randomly selected authors from other thematic blocks. Selection emphasizes fit to foundational themes, clarity, novelty of insight, and potential to generate discussion. We explicitly encourage work-in-progress and preliminary findings that advance foundational understanding.
For questions about the workshop, please contact:
egor.zverev@ist.ac.at
EurIPS 2025 Workshop on Foundations of Language Model Security
December 6-7, 2025 • Copenhagen, Denmark