Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs

New Research Contribution – EMNLP 2025

“Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs” by the ZHAW partners Gaye Colakoglu, Gürkan Solmaz, and Jonathan Fürst, published in Findings of the Association for Computational Linguistics (EMNLP 2025).

The paper explores how Large Language Models (LLMs) can be used for information extraction from layout-rich documents, a key challenge when working with complex, unstructured data. This work is directly related to Unstructured Dataset Profiling, contributing to better understanding, structuring, and extracting value from heterogeneous document collections.

Link: 10.18653/v1/2025.findings-emnlp.973

DataGEMS is a Research and Innovation Action funded by European Union under Horizon Europe Research and Innovation Programme via Grant Agreement No 101188416

e CONTENT SYSTEMS