Authors: David W. Embley, Wai Yin Mok
Tags: 2006, conceptual modeling
As XML data becomes more and more prevalent and as larger quantities of data find their way into XML documents, the need for quality XML data organization will only increase. One standard way of structuring data well is to reduce and, if possible, eliminate redundancy, while at the same time making the storage structures as compact as possible. In this paper, we present a methodology to generate XML storage structures where conforming XML documents are redundancy-free, and for most practical cases, are also fully compact. Our methodology assumes the input is a conceptual-model hypergraph. For the special case that every edge in the hypergraph is binary, we present a simple algorithm, guaranteed to always generate redundancy-free storage structures. We show, however, that generating a minimum number of redundancy-free storage structures is NP-hard. We therefore provide heuristics to guide the process and observe that these heuristics result in satisfactory solutions, which are often optimal. We then present a general algorithm for n-ary edges and show that it generates redundancy-free storage structures. The general algorithm must overcome several problems that do not arise in the special case.Read the full paper here: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1644731