Authors: Birgitta Koenig-Ries, Michael Owonibi
Tags: 2014, conceptual modeling
The importance of quality-assured data in scientific analysis necessitates the inclusion of data quality management (DQM) functionality in research data repositories in addition to their primary role of data storage, sharing and integration. Typically, the DQM workflow in data repositories is fixed and semi-automated for datasets whose structure and semantics is known a-priori, however, for other types of datasets, DQM is either manual or minimal. In comparison, classical DQM methodology (especially in data warehousing research) has established standard, typically manually undertaken, DQM procedures for different types of data. Therefore, our proposal aims at customizing and semi-automating the classical DQM procedures for bio-diversity data repositories. As opposed to reviewing scientific contents of the data, we focus on technical data quality. Our proposed workflow includes DQM criteria specification, client and server-side validation, data profiling, error detection analysis, data enhancement and correction, and quality monitoring.Read the full paper here: https://link.springer.com/chapter/10.1007/978-3-319-12256-4_17