XML to RDF Transformation


finished by Markus Fehlmann.

Final Thesis (PDF, 1 MB)


The Semantic Web, as initiated by the World Wide Web Consortium (W3C), provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. Tim Berners-Lee defines the Semantic Web as an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. The meaning of information is formalized using the Resource Description Framework (RDF). RDF is the W3C recommendation to represent machine-processable meta-data and is based on ontologies. Ontologies formally define concepts used in a domain and the relationship between these concepts. Ontologies can therefore be seen as the vocabulary used in RDF documents.

Since the Semantic Web is a rather new development, there are only few documents in RDF format. Many existing documents, however, are encoded in XML. To make these documents accessible in the Semantic Web, we developed WEESA, a technique to transform semi-structured XML documents into RDF. WEESA defines a mapping from XML elements/attributes to concepts/relations defined in the ontology that can be used to automatically generate RDF descriptions from XML documents.

Goal of this Diploma Thesis

The goal of this thesis is to first analyze the existing WEESA sample mappings and find new mapping directives to simplify the mapping definitions for common problems. Further you will integrate the new mapping directives in the existing WEESA architecture. The WEESA extensions should be implemented and evaluated on representative sample documents.

When working on the thesis you will...

  • get a deep insight in up-to-date (Semantic) Web standards (XML, XPath, DOM, RDF, OWL).
  • design the software architecture of an object oriented application.
  • get practice programming a medium size Java project when implementing the proposed architecture.
  • get experience with the Jena Java framework for building Semantic Web applications and standard XML programing libraries.
  • participate in a sourceforge.net open source project.
  • be best supervised by your advisers!

The envisioned outcome

A new version of the WEESA meta-data generator that supports new mapping directives to increase the efficiency and flexibility when defining the WEESA mapping.