Authors: Jon Atle Gulla, Terje Brasethvik
Tags: 1999, conceptual modeling
When publishing documents on the Web, the user needs to describe and classify her documents for the benefit of later retrieval and use. This paper presents an approach to semantic document classification and retrieval based on Natural Language Processing and Conceptual Modeling. The Referent Model language is used in combination with a lexical analysis tool to define a controlled vocabulary for classifying documents. Documents are classified by means of sentences that contain the high frequency words in the document that also occur in the domain model defining the vocabulary. The sentences are parsed using a DCG-like grammar, mapped into a Referent Model fragment and stored along with the document using RDF-XML syntax. The model fragment represents the connection between the document and the domain model and serves as a document index. The approach is being implemented for a document collection published by the Norwegian Center for Medical Informatics (KITH).Read the full paper here: https://link.springer.com/chapter/10.1007/3-540-48054-4_26