Extending ER models to capture database transformations to build data sets for data mining

0
84

Authors: Carlos Ordonez, David Sergio Matusevich, Sofian Maabout, Wellington Cabrera

Tags: 2013, conceptual modeling

In a data mining project developed on a relational database, a significant effort is required to build a data set for analysis. The main reason is that, in general, the database has a collection of normalized tables that must be joined, aggregated and transformed in order to build the required data set. Such scenario results in many complex SQL queries that are written independently from each other, in a disorganized manner. Therefore, the database grows with many tables and views that are not present as entities in the ER model and similar SQL queries are written multiple times, creating problems in database evolution and software maintenance. In this paper, we classify potential database transformations, we extend an ER diagram with entities capturing database transformations and we introduce an algorithm which automates the creation of such extended ER model. We present a case study with a public database illustrating database transformations to build a data set to compute a typical data mining model.

Read the full paper here: https://pdf.sciencedirectassets.com/271546/1-s2.0-S0169023X14X00029/1-s2.0-S0169023X13001298/main.pdf?X-Amz-Security-Token=IQoJb3JpZ2luX2VjEE0aCXVzLWVhc3QtMSJIMEYCIQDfFAI3hCknUfpFrJivpiPmyG3R2DBEyD4RNtIMxaAkTgIhAPcE3iBpCTMXCPLubb%2FHwJkLiRgdPb1rxbglDEE79%2FxbKrQDCGUQAxoMMDU5MDAzNTQ2ODY1IgxFl4MB%2B63rGxKGZFcqkQO%2Fp6Q2J5HDqMLeqwJ2IC3PkqAl4Dd8yPE%2FNAtFK7RB87Rk6Rfpfoh20VTLl23OAgbUqeGTh%2BOAYAzoDZYO67dEDVsYsBXf5Iy%2B%2FQSXswTjW0rhxFwvSD%2BheAAmC3Qqj2mWtVGmqcBYsHG4%2BfustdSN0WplOKYBTr1WfUwCsOiXEE8C5hurjeO%2BT0gv36MllD5Ls3KwHzA%2F6S7W2w5Gqk9NvRgLd9gKNQeVe0H0H34nDIk%2F2KV4f2lGTqHynrnsE755zAbIvfxvW4I%2F%2B0kJ4zqkUKpIk3MkGnUqO0uM9VnVDJ5DwlmJhHvrrfZDqqmNg6TgeiMnTqupH3c1se3Vx87XRY1M9gvrvTa0OSFtVyH2ASr1zS3kcboZUv1ATnP7wGY2GBiXYSpxYNTxuGNXnfBlslE6aevUxA2fm1xFJ7pkVoBdn93QZFCmN9UxG82zs4n%2BDkTk%2FhKIUHxUBGByJ961bHOhvGwAd3DE2JmH71tBn%2F24BbQb9wKAMl2K07otyAPkiX0GsmHLU5wT1lIIkXUt%2FDCM8J77BTrqAbaptFJ6U4sOdqNOYopiQnALPxLW0RuYW8q%2FnwkRjt7bHOOVXgf2XYFAAL7qlxFEM3rHU3tkQHACP5UQUnla9e4jf%2BrUBTqnmFJQsNszHh4yphSM%2Fb56kp44qw1%2BjBnUajEexnhYVLBb0SXVRsVfb%2F5teEGc8y%2BlOTO956X0RVbfKVRtZk7gwhncomQyae9KzHjr5fRhyVdhMXGHGswAdg9Vor0eBzlHlm1x6SUhmaHHhIAOzR1AXMjPMJjcgWESTUeo5OmwNG%2FjAenKS7Um64Fd6tJ9l44Fv6UjvqPKNsueQCKF8qgLjKnpbQ%3D%3D&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20200920T211915Z&X-Amz-SignedHeaders=host&X-Amz-Expires=300&X-Amz-Credential=ASIAQ3PHCVTY2QN6BRGE%2F20200920%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=08479b6837fda93f6f518bc4d7b573520c5b7a15ad55c4aa65507005e1f8de85&hash=5d2450a1a25ee7b36e8fffce0aa439fbe2c163f8584b4e8af4bde650dad0a03f&host=68042c943591013ac2b2430a89b270f6af2c76d8dfd086a07176afe7c76c2c61&pii=S0169023X13001298&tid=spdf-78cf9a04-1600-4fa1-995f-69d7f8f84f4a&sid=27939d14918fb14e6d7bc0d-c8b7bcc143a4gxrqa&type=client