On the Automatic Extraction of Data from the Hidden Web

0
102

Authors: David W. Embley, Sai Ho Yau, Stephen W. Liddle

Tags: 2001, conceptual modeling

An increasing amount of Web data is accessible only by filling out HTML forms to query an underlying data source. While this is most welcome from a user perspective (queries are easy and precise) and from a data management perspective (static pages need not be maintained; databases can be accessed directly), automated agents have greater difficulty accessing data behind forms. In this paper we present a method for automatically filling in forms to retrieve the associated dynamically generated pages. Using our approach automated agents can begin to systematically access portions of the “hidden Web.”

Read the full paper here: https://link.springer.com/chapter/10.1007/3-540-46140-X_17