“Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs” by the ZHAW partners Gaye Colakoglu, Gürkan Solmaz, and Jonathan Fürst, published in Findings of the Association for Computational Linguistics (EMNLP 2025).
The paper explores how Large Language Models (LLMs) can be used for information extraction from layout-rich documents, a key challenge when working with complex, unstructured data. This work is directly related to Unstructured Dataset Profiling, contributing to better understanding, structuring, and extracting value from heterogeneous document collections.