AI for Formal Methods#
Welcome to AI4FM - the AI for Formal Methods research group at Loyola University Chicago, part of the Department of Computer Science. We advance formal methods, rigorous system design, and reproducible tools at the intersection of AI, logic, mathematics, and real-world computing.
Featured Research#
Can LLMs Write Correct TLA+ Specifications?
Evaluating Natural-Language-to-TLA+ Generation - Under Submission
The first systematic evaluation of LLM-based TLA+ specification synthesis from natural language. We evaluate 30 LLMs across eight families on a curated dataset of 205 TLA+ specifications, validated by both the SANY parser and TLC model checker. LLMs achieve up to 26.6% syntactic correctness but only 8.6% semantic correctness. Results show model size does not predict quality and code-specialized models consistently underperform on formal languages.
Research Areas#
Formal Specification and Verification
Building datasets, tooling, and infrastructure around TLA+ and other formal specification languages. Making model checking accessible through open pipelines and notebook-based workflows.
LLMs for Formal Methods
Evaluating whether large language models can generate semantically correct formal specifications. Identifying where LLMs fail on rigorous specification tasks and why.
Empirical Software Engineering
Studying software artifacts at scale: pre-trained model naming conventions, supply chain security, and development practices in real-world open-source systems.
Security and Systems
IoT security research including signal injection attacks on pairing protocols, plus HPC education tools and agentic tutoring systems for parallel computing.
Meet the Team#
Research Papers
Research Updates#
Sunday, March 29, 2026: Can LLMs Write Correct TLA+ Specifications? Our New Evaluation Study
We have submitted a new paper evaluating whether large language models can generate semantically correct TLA+ specifications from natural language. We evaluated 30 LLMs across eight families on 205 TLA+ specifications. Best semantic correctness achieved was only 8.6%, and model size did not predict quality.
Wednesday, February 05, 2025: TLA+ for All: Running Model Checking in a Python Notebook
We published a project embedding TLA+ model checking directly into a Python notebook environment, making formal verification accessible without any installation. Users can write specifications, run the TLC model checker, and visualize behaviors in-browser.