Large Language Models (LLM) and Temporal Logic of Actions (TLA): How Effective are LLMs for Verification Systems

Large Language Models (LLM) and Temporal Logic of Actions (TLA): How Effective are LLMs for Verification Systems#

Status:: Presented
Venue:: GSIRS 2026 - Graduate School Interdisciplinary Research Symposium, April 11, 2026, Loyola University Chicago
Authors:: Brian Ortiz, Arslan Bisharat, Mohammed Abuhamad, Konstantin Läufer, Eric Spencer, Khushboo Bhadauria, George K. Thiruvathukal, TaiNing Wang
Institution:: Loyola University Chicago

Abstract#

TLA+ has supported industrial verification at companies such as Amazon and Microsoft, yet writing correct TLA+ specifications from natural language still requires time and expertise, which limits adoption. LLMs show promise, but no prior study measures whether they produce semantically correct TLA+ specifications from natural language.

This paper presents the first systematic evaluation of LLM-based TLA+ specification synthesis from natural language. Our study evaluates 30 LLMs across eight families on a curated dataset of 205 TLA+ specifications: 25 open-weight models across four prompting strategies (2,600 runs) and 5 proprietary models under few-shot prompting (130 runs), all validated by the SANY parser and TLC model checker. LLMs achieve up to 26.6% syntactic correctness but only 8.6% semantic correctness, with successes exclusive to progressive prompting. Results show that model size does not predict quality, and code-specialized models consistently underperform due to negative transfer from mainstream language training.

Based On#

This poster is based on the paper Can LLMs Write Correct TLA+ Specifications? Evaluating Natural-Language-to-TLA+ Generation, currently under submission.

Citation#

Brian Ortiz, Arslan Bisharat, Mohammed Abuhamad, Konstantin Läufer, Eric Spencer, Khushboo Bhadauria, George K. Thiruvathukal, and TaiNing Wang, Large Language Models (LLM) and Temporal Logic of Actions (TLA): How Effective are LLMs for Verification Systems, GSIRS 2026, April 11, 2026.

Large Language Models (LLM) and Temporal Logic of Actions (TLA): How Effective are LLMs for Verification Systems

Contents

Large Language Models (LLM) and Temporal Logic of Actions (TLA): How Effective are LLMs for Verification Systems#

Abstract#

Based On#

Citation#