Autoformalization and Deductive Reasoning

Formal representations promise precision and verifiability, but only if the translation from natural language is faithful. This thread studies where that translation breaks down — in deductive reasoning, in temporal reasoning, and in the general problem of converting human statements into logic, programs, or structured queries.

Our ECAI 2025 paper Investigating the Robustness of Deductive Reasoning with Large Language Models tests how small prompt and input variations affect deductive reasoning, characterises common failure modes, and proposes an evaluation setup for robustness (Hoppe et al., 2025). Our ACL 2025 paper ChronoSense examines temporal understanding using Allen-style event intervals, highlighting current gaps in interval reasoning (Islakoglu & Kalo, 2025). Ongoing supervision work extends this line to GRPO-based autoformalization from natural language to first-order logic.

References

2025

ECAI

Investigating the Robustness of Deductive Reasoning with Large Language Models

Fabian Hoppe, Filip Ilievski, and Jan-Christoph Kalo

In Proceedings of the 27th European Conference on Artificial Intelligence (ECAI 2025), 2025

DOI arXiv HTML
ACL

ChronoSense: Exploring Temporal Understanding in Large Language Models with Time Intervals of Events

Duygu Sezen Islakoglu, and Jan-Christoph Kalo

In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), 2025

DOI arXiv HTML PDF