IFI Colloquia: 26.09.2013

Recent advances in the field of information extraction have paved the way for the automatic construction and growth of large, semantic knowledge bases from Web sources. Knowledge bases like DBpedia or YAGO today contain hundreds of millions of facts about real-world entities and their relationships among each other, which are captured in the popular Resource Description Framework (RDF) format. However, the very nature of the underlying extraction techniques entails that the resulting RDF knowledge bases may face a significant amount of incorrect, incomplete, or even inconsistent factual knowledge, which makes efficient and reliable query answering over this kind of uncertain RDF data a challenge. Our query engine, coined URDF, performs query answering in uncertain RDF knowledge bases via a combination of Datalog-style deduction rules, consistency constraints, and probabilistic inference, which will be the main subject of this talk. Specifically, by casting the above scenario into a probabilistic database setting, we develop a new top-k algorithm for query answering, which - for the first time in the context of probabilistic databases - allows us to fully integrate data and confidence computations over this kind of probabilistic input data. Extensions of our framework include the automatic learning of these deduction rules from RDF data sources, as well as the consideration of temporal deduction rules and consistency constraints over time-annotated, probabilistic facts.