Research | Jan-Christoph Kalo

Towards a Science of Knowledge Translation

My research studies what happens at the boundaries between different representations of knowledge — text, tables, knowledge graphs, queries, logical formulas, and language model parameters. Moving between these forms is often lossy and context-dependent, and whether two representations encode the same knowledge depends on what population, time frame, or scope is assumed. Expressing the same knowledge in multiple forms and studying where the versions disagree is a diagnostic: it tells us what each representation captures, what it silently drops, and where context is doing hidden work.

Turning this diagnostic stance into predictive theory: when is a mapping between representations faithful, what does it silently drop, and can representational loss be measured rather than merely observed? The long-term goal is knowing in advance which kinds of knowledge survive which translations.

The research programme, drawn as what it is — a graph. Nodes are representations of knowledge; labelled edges are the research threads that map one into another, coloured by pillar (each edge links to its pillar below).

The three pillars below each work a bundle of these edges.

Between Language Models and Knowledge Graphs

What do language models actually know, and how does it line up with what knowledge graphs state? This pillar works the interface in all three directions: extraction, consistency, and injection.

Knowledge base construction from language models. Treating LLMs as compressed, implicit knowledge bases — extracting and evaluating their factual content, and comparing it to structured sources such as Wikidata.

Cross-lingual knowledge consistency. Whether multilingual models and public knowledge sources such as Wikipedia and Wikidata encode the same facts across languages — and what drives the gaps when they don’t. Cross-lingual disagreement is the diagnostic in its purest form: only the language changes, yet the answers do too. WILA-PopQA (KG-LLM Workshop @ LREC 2026) disentangles question language, entity language, and entity popularity, showing that the language of the question dominates factual recall.

Retrieval-augmented generation with knowledge graphs. Using structured knowledge to ground retrieval and generation — from KG-based passage expansion for question answering to mapping the fast-growing GraphRAG landscape. RAG is knowledge translation at inference time: what the graph contributes depends on how its structure is verbalized for the model.

From Language to Logic

Autoformalization and deductive reasoning. Translating natural-language statements into formal representations (logic, queries) and studying how LLMs perform deductive reasoning once knowledge is made explicit.

Temporal knowledge and reasoning. Temporal knowledge is a particularly difficult case: texts underspecify validity intervals, knowledge graphs qualify them, and language models flatten them into timeless facts. ChronoSense (ACL 2025) evaluates interval-based temporal understanding via Allen relations, exposing gaps between knowing when events happened and reasoning about how they relate.

Related projects: Autoformalization and Deductive Reasoning

Applied Translation: Official Statistics and Digital Health

Text-to-SQL over real-world statistical data. Working with Statistics Netherlands (CBS) on translating natural-language questions into SQL over complex statistical tables, where schema design encodes implicit context about populations and time frames. Home of the LOCuST benchmark: 2,244 real statistical tables across 22 domains, 2,567 annotated questions, in English and Dutch.

Semantic harmonisation in digital health (UNIFIED). Knowledge translation under regulatory constraint: harmonising patient-centred endpoints across devices, studies, and clinical vocabularies — the domain where translation loss has clinical consequences. Data integration and semantic harmonisation for patient-centred clinical-study endpoints derived from digital health technologies, as part of the EU Innovative Health Initiative project UNIFIED.