Authors: Bipin Sakamuri, Eric Chaudhry, K. Passi, Mukesh Mohania, S. Bhowmick, Sanjay Madria
Tags: 2003, conceptual modeling
The availability of large amounts of heterogeneous distributed web data necessitates the integration and querying of XML data from multiple XML sources for many reasons. For example, currently many government agencies in US such as IRS, INS, FBI, CIA are integrating their system to deal with new security threats, and these different departments uses legacy database systems including relational data, flat files, spreadsheets, and html pages, and simple text data. Similarly, there are many e-commerce companies, which sell similar products but represent data using different XML schemas. When any two such companies merge, or make an effort to service customers in cooperation, there is a need for a uniform schema integration methodology [1,2]. In some applications like comparison-shopping, there is a need for an illusionary centralized homogeneous information system. Such systems need a uniform data representation and access platform, which is provided by XML. However, the XML schema and data are still heterogeneous and represent their constraints differently. To avoid the overhead of system integration and system specific data access mechanisms, applications should be provided with data in an integrated form. The idea is to use XML as an intermediate medium to achieve date integration from heterogeneous data resources. There are many efforts currently on generating views or representing data in only XML format, but internally stored in legacy databases. Using wrappers, applications can view the data in XML, instead of moving the data from their original format to XML. However, wrappers fail if the structure of the data is dynamically changed. Our approach is two phase; the integration of the local XML schemas into a global schema, and the integration of the resultant XML data produced in response to the queries to the local XML data sources. A global schema eliminates data model differences by integrating local schemas. The heterogeneous XML data sources need not be represented in an integrated fashion. This is because integrating the XML data and storing it in the new integrated schema occupies extra resources, and may result in duplication, and thus, creates the problems of multiple updates and data inconsistencies. For this reason, we present a dynamic mechanism, which can interface the different XML data and can present an integrated representation of the XML sources, rather than physically integration of data.Read the full paper here: https://link.springer.com/chapter/10.1007/978-3-540-39648-2_48