Statistical Relational Learning
Relational data mining techniques play an important role in many disciplines, such as information science, sociology, bioinformatics or economics. They provide new ways to explore large corpora of data which capture dyadic relationships, interactions or links between documents, humans, genes, or financial institutions. However, we now increasingly have access to complex data that capture more than just dyadic relations. Examples include multi-relational data, time-stamped relations, relational data with noise, or sequential data. The question when a graph abstraction of such complex relational data is justified has not been answered satisfactorily.
To address this problem, we develop new algorithmic and statistical data mining techniques for relational data with complex characteristics. We are particularly interested in new ways to infer patterns in time-stamped and sequential data on networks. In a recent work, we developed a novel technique (i) to test when a network abstraction of such data is justified, and (ii) to infer optimal higher-order graphical models which generalize network-analytic methods. It has been implemented in the python package pathpy, which is available on github.
Exemplary publications
- I Scholtes: When is a Network a Network? Multi-Order Graphical Model Selection in Pathways and Temporal Networks. In KDD'17 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, Nova Scotia, Canada, August 13-17, 2017
- G Casiraghi, V Nanumyan, I Scholtes, F Schweitzer: From Relational Data to Graphs: Inferring Significant Links using Generalized Hypergeometric Ensembles. In Social Informatics. SocInfo 2017, LNCS, Vol. 10540, 2017
- G Casiraghi, V Nanumyan, I Scholtes, F Schweitzer: Generalized Hypergeometric Ensembles: Statistical Hypothesis Testing in Complex Networks. arXiv 1607.02441, July 2016