Research Papers#
Research outputs from the AI4FM group at Loyola University Chicago.
Can LLMs Write Correct TLA+ Specifications? Evaluating Natural-Language-to-TLA+ Generation
Accepted
The first systematic evaluation of LLM-based TLA+ specification synthesis from natural language. Evaluates 30 LLMs across eight families on 205 TLA+ specifications using the SANY parser and TLC model checker. LLMs achieve only 8.6% semantic correctness. Model size does not predict quality on formal languages.
ChatTLA+: Using LLMs for TLA+ Formal Specification Generation and Verification
Undergraduate Research and Engagement Symposium, April 17, 2026 - Eric Spencer
A presentation exploring the use of large language models for generating and verifying TLA+ formal specifications. Builds on the group’s systematic evaluation research, extending to practical generation and verification pipelines.
Large Language Models (LLM) and Temporal Logic of Actions (TLA): How Effective are LLMs for Verification Systems
GSIRS 2026 - Ortiz, Bisharat, Abuhamad, Läufer, Spencer, Bhadauria, Thiruvathukal, Wang
Presented at the Loyola University Chicago Graduate School Interdisciplinary Research Symposium, April 11, 2026. The first systematic evaluation of 30 LLMs on 205 TLA+ specifications, validated with the SANY parser and TLC model checker.
Automating TLA+ Model Synthesis with LLMs and Formal Verification Pipelines
GCASR 2025 - Ortiz, Abuhamad, Wang, Thiruvathukal, Läufer
Presented at the 12th Greater Chicago Area Systems Research Workshop, May 8, 2025, Loyola University Chicago. Presents a pipeline for automating TLA+ model synthesis from natural language using LLMs, validated with SANY and TLC.
TLA+ for All: Running Model Checking in a Python Notebook
figshare, February 2025 - Laufer, Thiruvathukal
Integrates TLA+ model checking into Python notebook environments, making formal verification accessible without specialized tooling. Supports literate modeling, reproducibility, and barrier-free access for teaching and research.