All Posts#
Formal Methods#
03-29-2026
Can LLMs Write Correct TLA+ Specifications? Our New Evaluation Study
We have submitted a new paper evaluating whether large language models can generate semantically correct TLA+ specifications from natural language. We evaluated 30 LLMs across eight families on 205 TLA+ specifications. Best semantic correctness achieved was only 8.6%, and model size did not predict quality.
02-05-2025
TLA+ for All: Running Model Checking in a Python Notebook
We published a project embedding TLA+ model checking directly into a Python notebook environment, making formal verification accessible without any installation. Users can write specifications, run the TLC model checker, and visualize behaviors in-browser.