An information-theoretic analysis of worst-case redundancy in database design

0
158

Authors: Leonid Libkin, Solmaz Kolahi

Tags: 2008, conceptual modeling

Normal forms that guide the process of database schema design have several key goals such as elimination of redundancies and preservation of integrity constraints, such as functional dependencies. It has long been known that complete elimination of redundancies and complete preservation of constraints cannot be achieved simultaneously. In this article, we use a recently introduced information-theoretic framework, and provide a quantitative analysis of the redundancy/integrity preservation trade-off, and give techniques for comparing different schema designs in terms of the amount of redundancy they carry. The main notion of the information-theoretic framework is that of an information content of each datum in an instance (which is a number in [0,1]): the closer to 1, the less redundancy it carries. We start by providing a combinatorial criterion that lets us calculate, for a relational schema with functional dependencies, the lowest information content in its instances. This indicates how good the schema design is in terms of allowing redundant information. We then study the normal form 3NF, which tolerates some redundancy to guarantee preservation of functional dependencies. The main result provides a formal justification for normal form 3NF by showing that this normal form pays the smallest possible price, in terms of redundancy, for achieving dependency preservation. We also give techniques for quantitative comparison of different normal forms based on the redundancy they tolerate.

Read the full paper here: https://dl.acm.org/doi/10.1145/320434.320436