Authors: Evguenia Altareva, Stefan Conrad
Tags: 2003, conceptual modeling
We propose a methodological framework for building a statistical integration model for heterogeneous data sources. We apply the latent class analysis, a well-established statistical method, to investigate the relationships between entities in data sources as relationships among dependent variables, with the purpose of discovering the latent factors that affect them. The latent factors are associated with the real world entities which are unobservable in the sense that we do not know the real world class memberships, but only the stored data. The approach provides the evaluation of uncertainties which aggregate in the integration process. The key parameter evaluated by the method is the probability of the real world class membership. Its value varies depending on the selection criteria applied in the pre-integration stages and in the subsequent integration steps. By adjusting selection criteria and the integration strategies the proposed framework allows to improve data quality by optimizing the integration process.Read the full paper here: https://link.springer.com/chapter/10.1007/978-3-540-39648-2_5