Schema-Based Web Wrapping

0
49

Authors: Andrea Tagarelli, Sergio Flesca

Tags: 2004, conceptual modeling

An effective solution to automate information integration is represented by wrappers, i.e. programs which are designed for extracting relevant contents from a particular information source, such as web pages. Wrappers allow such contents to be delivered through a self-describing and easily processable representation model. However, most existing approaches to wrapper designing focus mainly on how to generate extraction rules, while do not weigh the importance of specifying and exploiting the desired schema of the extracted information. In this paper, we propose a new wrapping approach which encompasses both extraction rules and the schema of required information in wrapper definitions. We investigate the advantages of suitably exploiting extraction schemata, and we define a clean declarative wrapper semantics by introducing (preferred) extraction models for source HTML documents with respect to a given wrapper.

Read the full paper here: https://link.springer.com/chapter/10.1007/978-3-540-30464-7_23