Authors: Ana León Palacio, Juan Carlos Casamayor Ródenas
Tags: 2018, conceptual modeling, Óscar Pastor López
The use of techniques such as Next Generation Sequencing increases our knowledge about the genomic risk of suffering a certain disease, improving our ability of providing an early diagnosis and thus an appropriate treatment for each patient. In order to provide an accurate diagnosis, clinicians must perform a search in the repositories of open data available to the research community. Nevertheless, the vast amount of heterogeneous and dispersed data sources that store information about gene-disease associations as well as their variable level of quality hinder the process of determining if the variants found in the DNA sequence of a patient’s sample are clinically relevant. In this paper, we present a systematic method based on conceptual modeling and data quality management techniques to tackle the aforementioned issues with the aim of helping the genomic diagnosis of a disease. To this end, we state the most prominent problems affecting repositories of open data for genomics. Then, we use a methodological approach for identifying what we called “smart data”: the relevant information hidden in the genomics data lake. Finally, in order to test and validate the proposed method, we apply it to a case study based on the clinical diagnosis of Crohn’s Disease.Read the full paper here: https://link-springer-com.proxy2.hec.ca/chapter/10.1007/978-3-030-00847-5_44