Jan-Christoph Kalo
Assistant Professor at INDElab, University of Amsterdam
University of Amsterdam
Intelligent Data Engineering Lab (INDElab)
Amsterdam, Netherlands
Hi, I am Jan-Christoph Kalo, Assistant Professor at the University of Amsterdam in the Intelligent Data Engineering Lab (INDElab).
My research studies what happens at the boundaries between different representations of knowledge. The same fact can live as a sentence in text, a row in a statistical table, a triple in a knowledge graph, a SQL query, a logical formula, or somewhere in the parameters of a language model — and moving between these forms is never lossless, nor is it context-free. Whether two representations actually encode the same knowledge depends on what population, time frame, or scope is assumed, and integrating them means making those assumptions explicit. I work at the intersection of language models, knowledge graphs, and databases on the translations, integrations, and mismatches between these representations, combining semantic web, database, and NLP techniques.
Concretely, this spans information extraction and knowledge base construction, text-to-SQL over real-world statistical data (in collaboration with Statistics Netherlands / CBS), autoformalization and deductive reasoning with LLMs, cross-lingual consistency on large public knowledge sources like Wikipedia and Wikidata, and probing what language models actually represent. A recurring methodological stance runs through these: expressing the same knowledge in multiple forms and studying where the versions disagree is a diagnostic — it tells us what each representation captures, what it silently drops, and where context is doing hidden work.
Beyond research, I supervise MSc and PhD projects and teach databases and knowledge technologies in the Bachelor and Master programs at UvA.
news
| Apr 22, 2026 | New paper at the KG-LLM Workshop (LREC 2026): A Wikidata-Based Framework to Measure Cross-Lingual Bias in Multilingual Large Language Models. We introduce WILA-PopQA, a popularity-matched multilingual benchmark across 9 languages, and disentangle three factors that multilingual probing benchmarks usually confound: the language of the question, the language of the entity, and entity popularity. Across 12 open-weight LLMs, the language of the question turns out to be the dominant factor, and matching it to the entity’s language does not reliably improve factual recall. |
|---|---|
| Sep 10, 2025 | Our paper on the robustness of deductive reasoning with LLMs was accepted at ECAI 2025; presentation coming soon. See it on the publications page: Robustness paper entry. Short description: We study how small prompt and input variations affect deductive reasoning, analyze common failure modes, and outline an evaluation setup for robustness. |
| Sep 10, 2025 | Published a short paper at ACL 2025: ChronoSense. See it on the publications page: ChronoSense entry. Short description: ChronoSense evaluates temporal understanding in large language models using event time intervals (e.g., Allen relations), highlighting current gaps in interval reasoning. |
selected publications
- AKBCKAMEL: Knowledge Analysis with Multitoken Entities in Language ModelsIn 4th Conference on Automated Knowledge Base Construction (AKBC 2022), 2022