Authors: Karen C. Davis, Prudhvi Janga
Tags: 2014, conceptual modeling
XML web data is heterogeneous in terms of content and tagging of information. Integrating, querying, and presenting heterogeneous collections presents many challenges. The structure of XML documents is useful for achieving these tasks; however, not every XML document on the web includes a schema. We propose and implement a framework for efficient schema extraction, integration, and relational schema mapping from heterogeneous XML documents collected from the web. Our approach uses the Schema Extended Context Free Grammar (SECFG) to model XML schemas and transform them into relational schemas. Unlike other implementations, our approach is also able to identify and transform many XML constraints into relational schema constraints while supporting multiple XML schema languages, e.g., DTD or XSD, or no XML schema, as input. We compare our approach with other proposed approaches and conclude that we offer better functionality more efficiently and with greater flexibility.Read the full paper here: https://link-springer-com.proxy2.hec.ca/chapter/10.1007/978-3-319-12206-9_7