Maintaining Consistency of Probabilistic Databases: A Linear Programming Approach

0
66

Authors: Wilfred Ng, You Wu

Tags: 2010, conceptual modeling

The problem of maintaining consistency via functional dependencies (FDs) has been studied and analyzed extensively within traditional database settings. There have also been many probabilistic data models proposed in the past decades. However, the problem of maintaining consistency in probabilistic relations via FDs is still unclear. In this paper, we clarify the concept of FDs in probabilistic relations and present an efficient chase algorithm LPChase(r,F) for maintaining consistency of a probabilistic relation r with respect to an FD set F . LPChase(r,F) adopts a novel approach that uses Linear Programming (LP) method to modify the probability of data values in r. There are many benefits of our approach. First, LPChase(r,F) guarantees that the output result is always the minimal change to r. Second, assuming that the expected size of an active domain consisting data values with non-zero probability is fixed, we demonstrate the interesting result that the LP solving time in LPChase(r,F) decreases as the probabilistic data domains grow, and becomes negligible for large domain size. On the other hand, the I/O time and modeling time become stable even when the domain size increases.

Read the full paper here: https://link.springer.com/chapter/10.1007/978-3-642-16373-9_22