From Web Tables to Concepts: A Semantic Normalization Approach

0
59

Authors: Katrin Braunschweig, Maik Thiele, Wolfgang Lehner

Tags: 2015, conceptual modeling

Relational Web tables, embedded in HTML or published on data platforms, have become an important resource for many applications, including question answering or entity augmentation. To utilize the data, we require some understanding of what the tables are about. Previous research on recovering Web table semantics has largely focused on simple tables, which only describe a single semantic concept. However, there is also a significant number of de-normalized multi-concept tables on the Web. Treating these as single-concept tables results in many incorrect relations being extracted. In this paper, we propose a normalization approach to decompose multi-concept tables into smaller single-concept tables. First, we identify columns that represent keys or identifiers of entities. Then, we utilize the table schema as well as intrinsic data correlations to identify concept boundaries and split the tables accordingly. Experimental results on real Web tables show that our approach is feasible and effectively identifies semantic concepts.

Read the full paper here: https://link-springer-com.proxy2.hec.ca/chapter/10.1007/978-3-319-25264-3_18