All Posts

Contents

All Posts#

Formal Methods#

03-29-2026

Can LLMs Write Correct TLA+ Specifications? Our New Evaluation Study

We have submitted a new paper evaluating whether large language models can generate semantically correct TLA+ specifications from natural language. We evaluated 30 LLMs across eight families on 205 TLA+ specifications. Best semantic correctness achieved was only 8.6%, and model size did not predict quality.

Read more ...

02-05-2025

TLA+ for All: Running Model Checking in a Python Notebook

We published a project embedding TLA+ model checking directly into a Python notebook environment, making formal verification accessible without any installation. Users can write specifications, run the TLC model checker, and visualize behaviors in-browser.

Read more ...