Inferring Deterministic Regular Expression with Counting

0
72

Authors: Haiming Chen, Xiaofan Wang

Tags: 2018, conceptual modeling

Since many XML documents are not accompanied by a schema, it is essential to devise algorithms for schema inference. In this paper, we extend the single-occurrence regular expressions (SOREs) to single-occurrence regular expressions with counting (ECsores) and give an inference algorithm for ECsores. First, we present a countable finite automaton (CFA). Then, we construct a CFA by converting the single-occurrence automaton (SOA) built for the given finite sample. Next, after the CFA recognizes the given finite sample, we obtain the minimum and maximum number of repetitions of the subexpressions derivable from the CFA, possibly updating them by using an optimization method. Finally we transform the CFA to an ECsore. Moreover, our algorithm can ensure the result is a minimal generalization (such generalization is also called descriptive generalization) of the given finite sample.

Read the full paper here: https://link.springer.com/chapter/10.1007/978-3-030-00847-5_15