SQL Data Profiling of Foreign Keys

0
121

Authors: Gillian Dobbie, Mozhgan Memari, Sebastian Link

Tags: 2015, conceptual modeling

Referential integrity is one of the three inherent integrity rules and can be enforced in databases using foreign keys. However, in many real world applications referential integrity is not enforced since foreign keys remain disabled to ease data acquisition. Important applications such as anomaly detection, data integration, data modeling, indexing, reverse engineering, schema design, and query optimization all benefit from the discovery of foreign keys. Therefore, the profiling of foreign keys from dirty data is an important yet challenging task. We raise the challenge further by diverting from previous research in which null markers have been ignored. We propose algorithms for profiling unary and multi-column foreign keys in the real world, that is, under the different semantics for null markers of the SQL standard. While state of the art algorithms perform well in the absence of null markers, it is shown that they perform poorly in their presence. Extensive experiments demonstrate that our algorithms perform as well in the real world as state of the art algorithms perform in the idealized special case where null markers are ignored.

Read the full paper here: https://link-springer-com.proxy2.hec.ca/chapter/10.1007/978-3-319-25264-3_17