Research

My research studies what happens at the boundaries between different representations of knowledge — text, tables, knowledge graphs, queries, logical formulas, and language model parameters. Moving between these forms is never lossless, and whether two representations encode the same knowledge depends on what population, time frame, or scope is assumed. Expressing the same knowledge in multiple forms and studying where the versions disagree is a diagnostic: it tells us what each representation captures, what it silently drops, and where context is doing hidden work.

Autoformalization and Deductive Reasoning

Translating natural-language statements into formal representations (logic, queries) and studying how LLMs perform deductive and temporal reasoning once knowledge is made explicit. See related projects →

Knowledge Base Construction from Language Models

Treating LLMs as compressed, implicit knowledge bases — extracting and evaluating their factual content, and comparing it to structured sources such as Wikidata. See related projects →

Text-to-SQL over Real-World Statistical Data

Working with Statistics Netherlands (CBS) on translating natural-language questions into SQL over complex statistical tables, where schema design encodes implicit context about populations and time frames. Home of the LOCuST benchmark. See related projects →

Patient-Centred Endpoints in Digital Health (UNIFIED)

Data integration and semantic harmonisation for patient-centred clinical-study endpoints derived from digital health technologies, as part of the EU Innovative Health Initiative project UNIFIED. See related project →