Harvesting RDF Meta-Data for a Semantic Portal

Status

finished by Peder Pfister

Introduction

The Semantic Web, as initiated by the World Wide Web Consortium (W3C), provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. Tim Berners-Lee defines the Semantic Web as an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. The meaning of information is formalized using the Resource Description Framework (RDF). Therefore, in the Semantic Web, Web pages offer both, content in HTML format for human users and RDF meta-data describing the content in a machine-processable way.

Looking at the meta-data of a single Web page, however, gives only a limited view of the information offered by a Web application. For querying and reasoning purpose it would be better to have the whole meta-data model of the Web application at hand. In this thesis we design and implement a Semantic Harvester infrastructure that downloads and integrates the meta-data form individual Semantic Web applications into the knowledge base (KB) of a Semantic portal. The KB can further be offered for querying as a Web Service.

Goal of this Diploma Thesis

The goal of this thesis is to develop the infrastructure that builds the KB of a Semantic Portal. You will define the architecture and implement the prototype of the Semantic Harvester. Individual Web applications can register at the Semantic Harvester. The harvester downloads periodically the meta-data form the registered Web applications. The downloaded meta-data has then to be integrated and unified to build the global KB of the Semantic Portal. The KB is stored in a RDF database that is made accessible via SPARQL queries to the outside world.

When working on the thesis you will...

  • get a deep insight in up-to-date Semantic Web standards (RDF, OWL, SPARQL).
  • design the software architecture of an object oriented, distributed application.
  • get practice programming a medium size Java project when implementing the proposed architecture.
  • work on the integration of the downloaded meta-data in to the KB of the Semantic Portal.
  • get experience with the Jena Java framework for building Semantic Web applications and other Java programming libraries.
  • work with RDF databases.