Knowledge Base Construction from Language Models

A large language model can be studied as a compressed, implicit knowledge source. This thread asks what kind of knowledge source it is — how to extract it, how to evaluate it, and where it diverges from structured sources like Wikidata.

Methodological work includes KAMEL (AKBC 2022), a probing benchmark with multi-token entities (Kalo & Fichtel, 2022); Evaluating the Knowledge Base Completion Potential of GPT (EMNLP Findings 2023), a systematic study of LLMs as KB completion sources (Veseli et al., 2023); and KnowlyBERT (ISWC 2020), a hybrid architecture combining language models with knowledge graphs at query time (Kalo et al., 2020). Earlier work on Prompt Tuning or Fine-Tuning (AKBC 2021) (Fichtel et al., 2021) and Prompting as Probing (LM-KBC 2022) (Alivanistos et al., 2022) laid the groundwork for treating LMs as queryable KBs. The position paper Large Language Models and Knowledge Graphs: Opportunities and Challenges (TGDK 2023) maps the broader space (Pan et al., 2023).

Recent work extends this to multilingual settings: A Wikidata-Based Framework to Measure Cross-Lingual Bias in Multilingual LLMs (KG-LLM @ LREC 2026) introduces the WILA-PopQA benchmark and disentangles the effects of question language, entity language, and popularity on factual recall (Iferroudjene et al., 2026).

I co-organise the LM-KBC Challenge at ISWC, which has run since 2022.

References

2026

KG-LLM

A Wikidata-Based Framework to Measure Cross-Lingual Bias in Multilingual Large Language Models

Mouloud Iferroudjene, Lisa Poggel, Andrea Schimmenti, and 4 more authors

In Proceedings of the Workshop on Knowledge Graphs and Large Language Models (KG-LLM @ LREC 2026), 2026

PDF Code

2023

EMNLP

Evaluating the Knowledge Base Completion Potential of GPT

Blerta Veseli, Simon Razniewski, Jan-Christoph Kalo, and 1 more author

In Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

DOI arXiv HTML PDF
TGDK

Large Language Models and Knowledge Graphs: Opportunities and Challenges

Jeff Z. Pan, Simon Razniewski, Jan-Christoph Kalo, and 8 more authors

Transactions on Graph Data and Knowledge (TGDK), 2023

DOI arXiv HTML

2022

AKBC

KAMEL: Knowledge Analysis with Multitoken Entities in Language Models

Jan-Christoph Kalo, and Leandra Fichtel

In 4th Conference on Automated Knowledge Base Construction (AKBC 2022), 2022

HTML
LM-KBC

Prompting as Probing: Using Language Models for Knowledge Base Construction

Dimitrios Alivanistos, Selene Baez Santamaría, Michael Cochez, and 3 more authors

In LM-KBC Challenge at ISWC 2022, 2022

arXiv PDF

2021

AKBC

Prompt Tuning or Fine-Tuning – Investigating Relational Knowledge in Pre-Trained Language Models

Leandra Fichtel, Jan-Christoph Kalo, and Wolf-Tilo Balke

In 3rd Conference on Automated Knowledge Base Construction (AKBC 2021), 2021

DOI HTML

2020

ISWC

KnowlyBERT – Hybrid Query Answering over Language Models and Knowledge Graphs

Jan-Christoph Kalo, Leandra Fichtel, Philipp Ehler, and 1 more author

In International Semantic Web Conference (ISWC 2020), 2020

DOI HTML