IfI Colloquium:
March 5, 2015

Many scientific databases nowadays are publicly available for querying and advanced data analytics. One prominent example is the Sloan Digital Sky Survey (SDSS) SkyServer, which offers data to astronomers, scientists, and the general public. With a large user base, it is worthwhile to identify the areas of the data space that are of interest to many users. This is beneficial for understanding the public focus, and the trending research directions on the subject described by the database, i.e., astronomy in the case of SkyServer. In a current research project, we study the problem of extracting and analyzing access areas of user queries, by analyzing the query logs of the database. To our knowledge, both the concept of access areas and how to extract them have not been studied before. We address these shortcomings by first proposing a novel notion of access area which is independent of any specific database state. It should allow to detect interesting areas of the data space, regardless if they have already existed in the database content. Second, we present a detailed mapping of our notion to different query types. Using our mapping on the SkyServer query log, we obtain a transformed data set. Third, we propose a new distance function to analyze this data set. Applying DBScan with our distance function, we arrive at access areas that are interesting from the perspective of an astronomer. These areas occupy only a small fraction (in some cases less than 1%) of the data space and are accessed by many users. Some frequently accessed areas even do not exist in the space spanned by available objects.

Klemens Böhm is full professor (chair of databases and information systems) at Karlsruhe Institute of Technology (KIT), Germany, since 2004. Prior to that, he has been professor of applied informatics/data and knowledge engineering at University of Magdeburg, Germany, senior research assistant at ETH Zürich, Switzerland, and research assistant at GMD -- Forschungszentrum Informationstechnik GmbH, Darmstadt, Germany. Current research topics at his chair are knowledge discovery and data mining, data privacy and workflow management. Klemens gives much attention to collaborations with other scientific disciplines and with industry.