Can LLMs Write Correct TLA+ Specifications? Our New Evaluation Study

Can LLMs Write Correct TLA+ Specifications? Our New Evaluation Study#

We have submitted a new paper evaluating whether large language models can generate semantically correct TLA+ specifications from natural language. We evaluated 30 LLMs across eight families on 205 TLA+ specifications. Best semantic correctness achieved was only 8.6%, and model size did not predict quality.

Read the full paper details

Note

This paper is currently under submission.