Authors: Haiming Chen, Xiaofan Wang
Tags: 2018, conceptual modeling
Since many XML documents are not accompanied by a schema, it is essential to devise algorithms for schema inference. In this paper, we extend the single-occurrence regular expressions (SOREs) to single-occurrence regular expressions with counting (ECsores) and give an inference algorithm for ECsores. First, we present a countable finite automaton (CFA). Then, we construct a CFA by converting the single-occurrence automaton (SOA) built for the given finite sample. Next, after the CFA recognizes the given finite sample, we obtain the minimum and maximum number of repetitions of the subexpressions derivable from the CFA, possibly updating them by using an optimization method. Finally we transform the CFA to an ECsore. Moreover, our algorithm can ensure the result is a minimal generalization (such generalization is also called descriptive generalization) of the given finite sample.Read the full paper here: https://link-springer-com.proxy2.hec.ca/chapter/10.1007/978-3-030-00847-5_15