Publications

Public profiles: DBLP, Google Scholar, and Microsoft academic.

2020

  • The Relational Data Borg is Learning (Keynote) [arxiv] Journal
    Dan Olteanu.
    In PVLDB 13(12): 3503 - 3516, 2020.
  • LMFAO: An Engine for Batches of Group-By Aggregates [arxiv, video] Demo
    Maximilian Schleich, Dan Olteanu.
    In PVLDB 13(12): 2945 - 2948, 2020.
  • Rk-means: Fast Clustering for Relational Data. [arxiv] Conference
    Ryan Curtin, Ben Moseley, Hung Q. Ngo, XuanLong Nguyen, Dan Olteanu, Maximilian Schleich.
    In Artificial Intelligence and Statistics (AISTATS), Palermo, Italy, June 2020.
    arXiv report 1910.04939, October 2019.
  • F-IVM: Learning over Fast Evolving Relational Data. [arxiv] Demo
    Milos Nikolic, Haozhe Zhang, Ahmet Kara, Dan Olteanu.
    In ACM SIGMOD, Portland, Oregon, June 2020.
  • Trade-offs in Static and Dynamic Evaluation of Hierarchical Queries. [arxiv, slides] Conference
    Ahmet Kara, Milos Nikolic, Dan Olteanu, Haozhe Zhang.
    In ACM Principles of Database Systems (PODS), Portland, Oregon, June 2020.
    arXiv report 1907.01988, July 2019.
  • Multi-Layer Optimizations for End-to-End Data Analytics. [arxiv] Conference
    Amir Shaikhha, Maximilian Schleich, Alexandru Ghita, and Dan Olteanu.
    In Code Generation and Optimization (CGO), San Diego, February 2020.
    arXiv report 2001.03541, January 2020.
  • Functional Aggregate Queries with Additive Inequalities. [arxiv] Journal
    Mahmoud Abo Khamis, Ryan Curtin, Benjamin Moseley, Hung Ngo, XuanLong Nguyen, Dan Olteanu and Maximilian Schleich.
    In ACM Transactions on Database Systems (TODS), 2020 (accepted September 2020).
    arXiv report 1812.09526, September 2020.
  • Maintaining Triangle Queries under Updates. [arxiv] Journal
    Ahmet Kara, Hung Q. Ngo, Milos Nikolic, Dan Olteanu, Haozhe Zhang.
    In ACM Transactions on Database Systems (TODS), 2020 (accepted April 2020). Special issue of best papers at ICDT 2019.
    arXiv report 2004.03716, April 2020.
  • In-Database Learning with Sparse Tensors and Functional Dependencies. [arxiv] Journal
    Mahmoud Abo Khamis, Hung Ngo, XuanLong Nguyen, Dan Olteanu, and Maximilian Schleich.
    In ACM Transactions on Database Systems (TODS), 2020 (accepted Dec 2019). Special issue of best papers at PODS 2018.
    arXiv report 1703.04780, Nov 2018.
  • Proceedings of the 23nd International Conference on Extending Database Technology, EDBT 2020, Copenhagen, Denmark, March 30 - April 02, 2020. [contents] Editor
    Angela Bonifati, Yongluan Zhou, Marcos Antonio Vaz Salles, Alexander Böhm, Dan Olteanu, George H. L. Fletcher, Arijit Khan, Bin Yang.

2019

  • Learning Models over Relational Databases. [keynote] Conference Keynote
    Dan Olteanu.
    In Int Conf on Database Theory (ICDT), Lisbon, March 2019.
  • Counting Triangles under Updates in Worst-Case Optimal Time. [arxiv, talk] Conference Best Paper Award
    Ahmet Kara, Hung Ngo, Milos Nikolic, Dan Olteanu, and Haozhe Zhang.
    In Int Conf on Database Theory (ICDT), Lisbon, March 2019.
    Extended version in arXiv report 1804.02780, April 2018.
  • Boolean Tensor Decomposition for Conjunctive Queries with Negation. [arxiv, talk] Conference
    Mahmoud Abo Khamis, Hung Ngo, Dan Olteanu, and Dan Suciu.
    In Int Conf on Database Theory (ICDT), Lisbon, March 2019.
    Technical report, Dec 2017.
  • A Layered Aggregate Engine for Analytics Workloads. [arxiv] Conference
    Maximilian Schleich, Dan Olteanu, Mahmoud Abo Khamis, Hung Ngo and XuanLong Nguyen.
    In ACM SIGMOD, Amsterdam, July 2019.
  • On Functional Aggregate Queries with Additive Inequalities. [arxiv] Conference
    Mahmoud Abo Khamis, Ryan Curtin, Benjamin Moseley, Hung Ngo, XuanLong Nguyen, Dan Olteanu and Maximilian Schleich.
    In ACM Principles of Database Systems (PODS), Amsterdam, July 2019.
    Extended version in arXiv report 1812.09526, December 2018.
  • Learning Models over Relational Data: A Brief Tutorial. [arxiv] Conference
    Maximilian Schleich, Dan Olteanu, Mahmoud Abo-Khamis, Hung Q. Ngo, and XuanLong Nguyen.
    In Scalable Uncertainty Management (SUM), Compiègne, December 2019.

2018

  • Incremental Techniques for Large-Scale Dynamic Query Processing. [ pdf] Conference
    Iman Elghandour, Ahmet Kara, Dan Olteanu, and Stijn Vansummeren.
    Tutorial In CIKM, Turin (Italy), Oct 2018.
  • Incremental View Maintenance with Triple Lock Factorisation Benefits. [talk, arxiv] Conference
    Milos Nikolic and Dan Olteanu.
    In ACM SIGMOD, Houston (USA), June 2018.
    Extended version in arXiv technical report 1703.07484, March 2017.
  • In-Database Learning with Sparse Tensors. [talk, video, arxiv] Conference
    Mahmoud Abo Khamis, Hung Ngo, XuanLong Nguyen, Dan Olteanu, and Maximilian Schleich.
    In ACM Principles of Database Systems (PODS), Houston, June 2018.
    Extended version in arxiv technical report 1703.04780, March 2017.
  • AC/DC: In-Database Learning Thunderstruck. [arxiv] Workshop
    Mahmoud Abo Khamis, Hung Q. Ngo, XuanLong Nguyen, Dan Olteanu, Maximilian Schleich.
    In 2nd Workshop on Data Management for End-to-End Machine Learning (DEEM@SIGMOD), Houston, June 2018.
    arXiv report 1803.07480, Mar 2018.
  • Covers of Query Results. [arxiv, talk] Conference
    Ahmet Kara and Dan Olteanu.
    In Int Conf on Database Theory (ICDT), Vienna, March 2018.
    Extended version in arxiv report, Sept 2017.
  • Towards Deterministic Decomposable Circuits for Safe Queries. [pdf, code] Workshop
    Mikaël Monet and Dan Olteanu.
    In Alberto Mendelzon Workshop (AMW), Cali (Colombia), May 2018.
  • Counting Triangles under Updates (short informal paper). Workshop
    Ahmet Kara, Hung Ngo, Milos Nikolic, Dan Olteanu, and Haozhe Zhang
    In Alberto Mendelzon Workshop (AMW), Cali (Colombia), May 2018.
    Extended version in arxiv 1804.02780.
  • Counting Triangles under Updates (short informal paper). Workshop
    Ahmet Kara, Hung Ngo, Milos Nikolic, Dan Olteanu, and Haozhe Zhang
    In Conférence sur la Gestion de Données – Principes, Technologies et Applications (BDA), Bucharest (Romania), Oct 2018.
    Extended version in arxiv 1804.02780.
  • Alberto Mendelzon Workshop on Foundations of Data Management (AMW 2018), Proceedings. [web] Editor
    Editors: Barbara Poblete, Dan Olteanu.
     

2017

  • Special Issue on In-database Analytics. [ pdf , web , call] Editor
    Dan Olteanu and Florin Rusu.
    In Distributed Parallel Databases, Springer, Sept 2017.
  • Declarative Probabilistic Programming with Datalog. [ pdf ] Journal
    Vince Barany and Balder ten Cate and Benny Kimelfeld and Dan Olteanu and Zografoula Vagena.
    In ACM Transactions on Database Systems (TODS), Aug 2017.
    ACM TODS special issue of best papers at ICDT 2016.
  • Technical Perspective: Juggling Functions inside a Database. [pdf] Journal
    Dan Olteanu.
    In SIGMOD Record, March 2017.
  • In-Database Factorized Learning. [pdf] Workshop
    Hung Ngo, XuanLong Nguyen, Dan Olteanu, and Maximilian Schleich.
    In Alberto Mendelzon Workshop (AMW), Montevideo, June 2017.
  • Uncertain Data Models. [pdf] Chapter
    Christoph Koch and Dan Olteanu.
    In Encyclopedia of Database Systems, vol. 2, 2017.
    Technical report, July 2014.
  • Query Processing over Uncertain Data. [pdf] Chapter
    Nilesh Dalvi and Dan Olteanu.
    In Encyclopedia of Database Systems, vol. 2, 2017.
    Technical report, July 2014.

2016

  • Worst-Case Optimal Join at a Time. [pdf] Tech Report
    Radu Ciucanu and Dan Olteanu.
    Technical report. First version: Nov 2015. Latest version: March 2016.
  • Factorized Databases. [pdf] Journal
    Dan Olteanu and Maximilian Schleich.
    In SIGMOD Record (Database Principles Column), vol. 45, no. 2, June 2016.
  • F: Regression Models over Factorized Views. [pdf, poster] Demo
    Dan Olteanu and Maximilian Schleich.
    In PVLDB 9(13), New Delhi, Sept 2016.
  • Learning Linear Regression Models over Factorized Joins. [pdf(updated), poster, talk] Conference
    Maximilian Schleich and Dan Olteanu and Radu Ciucanu.
    In ACM SIGMOD, San Francisco, June 2016.
  • Factorized Databases: A Knowledge Compilation Perspective. [pdf] Workshop
    Dan Olteanu.
    In BeyondNP, AAAI workshop, Phoenix, April 2016.
  • Declarative Probabilistic Programming with Datalog. [pdf, arxiv] Conference
    Vince Barany and Balder ten Cate and Benny Kimelfeld and Dan Olteanu and Zografoula Vagena.
    In Int Conf on Database Theory (ICDT), Bordeaux, March 2016.
    Extended version in arxiv, December 2014.
    Probabilistic-programming Datalog can express probabilistic models concisely and declaratively in a Datalog extension with probability distributions as first-class citizens.
  • ENFrame: A Framework for Processing Probabilistic Data. [pdf] Journal
    Dan Olteanu and Sebastiaan van Schaik.
    In ACM Transactions on Database Systems (TODS). 41(1): 3, 2016.
    ACM TODS special issue of best papers at EDBT 2014.
    Submitted Feb 2015, Accepted December 2015.
    Main reference article for the ENFrame programming framework for probabilistic data, with a focus on multi-core probability computation of event networks.
  • Dichotomies for Queries with Negation in Probabilistic Databases. [pdf] Journal
    Robert Fink and Dan Olteanu.
    In ACM Transactions on Database Systems (TODS). 41(1): 4, 2016.
    Submitted Nov 2014, Accepted October 2015.
    Extended version of the PODS 2014 paper.
    The tractability frontier of two classes of relational algebra queries in tuple-independent probabilistic databases: (1) queries with join, projection, selection, and negation, but without repeating relation symbols and union; (2) quantified queries that express the following binary relationships amongst sets of entities: set division, set inclusion, set equivalence, and set incomparability. Each query in the two classes has either polynomial-time or #P-hard data complexity and the tractable queries can be recognised efficiently.

2015

  • Live Programming Support in the LogicBlox System: A MetaLogiQL Approach. [pdf] Journal
    TJ Green, Dan Olteanu, Geoffrey Washburn
    In PVLDB:8(12), Hawaii, Sept 2015.
    Yo Dawg, We Heard You Like Datalog Engines, so We Put a Datalog Engine in Your Datalog Engine so You Can Derive While You Derive!
  • The Design and Implementation of the LogicBlox System. [pdf] Conference
    Molham Aref, Balder ten Cate, TJ Green, Benny Kimelfeld, Dan Olteanu, Emir Pasalic, Todd Veldhuizen, Geoffrey Washburn
    In ACM SIGMOD, Melbourne, June 2015.
    Overview of design considerations behind the LogicBlox system and highlight of innovative aspects, including: LogiQL, a unified and declarative language based on Datalog; the use of purely functional data structures; novel join processing strategies; advanced incremental maintenance and live programming facilities; a novel concurrency control scheme; and built-in support for descriptive and predictive analytics.
  • Size Bounds for Factorised Representations of Query Results. [pdf] Journal
    Dan Olteanu and Jakub Závodný.
    Submitted July 2013, Accepted July 2014.
    In ACM Transactions on Database Systems (TODS), 40(1):2, 2015.
    Main reference for the theoretical foundation of factorized databases: Tight asymptotic bounds for size of factorized results of conjunctive queries; worst-case optimal join evaluation with factorized results; size gaps when shifting from relations to factorizations and to factorizations with subexpression sharing.
  • PPDL: Probabilistic Programming with Datalog. [arxiv] Workshop
    Balder ten Cate, Benny Kimelfeld, and Dan Olteanu.
    In Alberto Mendelzon Workshop (AMW), Lima, April 2015.
    Short version of technical report, arXiv, December 2014.
    Probabilistic-programming Datalog can express probabilistic models concisely and declaratively in a Datalog extension with probability distributions as first-class citizens.

2014

  • Short articles on Relational compression beyond Zip and Live programming with datalog. [ full edition] Journal
    Dan Olteanu.
    In Inspired Research newsletter (7th edition), page 7, October 2014.
  • A dichotomy for non-repeating queries with negation in probabilistic databases. [pdf, poster, talk] Conference
    Robert Fink and Dan Olteanu.
    In Principles of Database Systems (PODS), Snowbird, June 2014.
    The data complexity of any (relational algebra without repeating relation symbols and union) query Q on tuple-independent databases is polynomial time if Q is hierarchical and #P-hard otherwise. The tractable queries are the hierarchical ones and can be recognized efficiently. This extends a well-known dichotomy for conjunctive queries without self-joins to the case of queries with negation.
  • Probabilistic Data Programming with ENFrame. [pdf] Journal
    Dan Olteanu and Sebastiaan van Schaik.
    In IEEE Data Engineering Bulletin, 2014.
    Overview of the ENFrame programming framework for probabilistic data spelling out design decisions and challenges lying ahead.
  • ENFrame = (Programs + Queries) / Probabilistic Data. [pdf, poster] Workshop
    Dan Olteanu and Sebastiaan van Schaik.
    In Big Uncertain Data (BUDA), workshop at PODS, Snowbird, June 2014.
    Short version of the ENFrame overview paper in IEEE Data Eng. Bul. 2014.
  • ENFrame: A Platform for Processing Probabilistic Data. [pdf, arxiv] Conference
    Sebastiaan van Schaik, Dan Olteanu and Robert Fink.
    In Extending Data Base Technology (EDBT), Athens, March 2014.
    Invited to ACM TODS special issue of best papers at EDBT 2014.
    ENFrame is a unified data processing platform for querying and processing probabilistic data with low-level entry for users. User programs in Python and oblivious of underlying probabilistic data and models. Expressive language for probabilistic events in input and computation traces. Approximate inference for networks on interconnected probabilistic events via variable elimination on multi-cores.
  • Parallel OWL 2 RL Materialisation in Centralised, Main-Memory RDF Systems. [pdf] Workshop
    Boris Motik, Yavor Nenov, Robert Piro, Ian Horrocks and Dan Olteanu.
    In DL, Vienna, July 2014.
    The RDFox datalog engine (see AAAI'14 paper below) used for materialization of OWL RL Knowledge Bases.
  • Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF Systems. [pdf] Conference
    Boris Motik, Yavor Nenov, Robert Piro, Ian Horrocks and Dan Olteanu.
    In AAAI, Quebec, July 2014.
    RDFox: Parallel materialization of datalog programs in centralized, main-memory, multi-core RDF systems. RDF indexing data structure supporting efficient, mostly lock-free parallel updates.

2013

  • Aggregation and Ordering in Factorised Databases. [pdf, talk] Journal
    Nurzhan Bakibayev, Tomas Kocisky, Dan Olteanu, and Jakub Závodný.
    In PVLDB 2013 vol 6. Presented at VLDB, Hongzhou, Sept 2014.
    We extend the FDB engine for select-project-join queries to evaluate queries with aggregates, group-by, and order-by clauses on relational data, where the input data and the intermediate/final results may be factorized.
  • Pigora: An Integration System for Probabilistic Data. [pdf, poster] Demo
    Dan Olteanu, Lampros Papageorgiou, and Sebastiaan van Schaik.
    In IEEE Int Conf on Data Engineering (ICDE), Brisbane, March 2013.
    Pigora provides a queryable uniform interface (a mediated relational schema) to heterogeneous probabilistic data, such as Bayesian networks, probabilistic c-tables, stochastic automata. Query processing is supported either by delegating sub-queries to engines specific to the probabilistic formalism of a data source (eg, MayBMS/SPROUT for probabilistic c-tables, SMILE for Bayesian networks, Staccato for stochastic automata) and with varying degrees of query support or by translating all data sources into one probabilistic formalism followed by evaluating the query using an engine dedicated to that formalism.
  • Anytime Approximation in Probabilistic Databases. [pdf] Journal
    Robert Fink, Jiewen Huang, and Dan Olteanu.
    In Very Large Data Bases Journal (VLDBJ), 2013.
    We describe the approximation algorithm for computing the probability of propositional formulas over discrete random variables used by the SPROUT query engine to approximate the probabilities of results to relational algebra queries on expressive probabilistic databases. It puts together research previously reported in ICDT'11 and ICDE'10 and '11 on (i) model-based propositional bounds (least upper bounds and greatest lower bounds) with tractable probabilistic inference and (ii) incremental knowledge compilation of propositional formulas. Aspects of this work were used and extended by various researchers for top-k, aggregates, probabilistic XML, and explanation and sensitivity analysis.
  • Big Data - 29th British National Conference on Databases (BNCOD 2013), Proceedings. [web] Editor
    Editors: Georg Gottlob, Giovanni Grasso, Dan Olteanu, Christian Schallhart.
    BNCOD is a venue for the presentation and discussion of research papers on a broad range of topics related to data-centric computation. The theme of BNCOD 2013 was Big Data. It encompases a growing need to manage data that is too big, too fast, or too hard for the existing technology. The conference attracted 42 complete submissions from 14 different African, European, South and North American coutries. There were 20 accepted papers on topics such as query and update processing, relational storage, benchmarking, XML query processing, Big Data, spatial data, indexing, data extraction and social networks. The conference programme also included three keynote talks, two tutorials, and one panel session.
  • Letter from the Associate Editors. Journal
    Dan Olteanu and Divesh Srivastava.
    In PVLDB 6(5): i-vii (2013)
  • Report on the first Workshop on Innovative Querying of Streams. [pdf] Journal
    Michael Benedikt and Dan Olteanu.
    In ACM SIGMOD Record, June 2013.
    Stream processing represents a thriving area of research across the algorithms, databases, networking, programming languages, and systems research communities. This workshop overviews recent developments of query processing on uncertain/XML/distributed/semantic web data streams and of streaming frameworks and systems.

2012

  • Demonstration of the FDB Query Engine for Factorised Databases. [pdf, poster] Demo
    Nurzhan Bakibayev and Dan Olteanu and Jakub Závodný.
    System demonstration. In Very Large Data Bases (PVLDB), 5(12): 1950-1953. Istanbul, 2012.
    FDB is demonstrated using data sets from IMDB, DBLP, and NELL repository of web data.
  • DAGger: Clustering Correlated Uncertain Data. [pdf, poster] Demo
    Dan Olteanu and Sebastiaan van Schaik.
    System demonstration. In ACM SIGKDD Conf on Knowledge Discovery and Data Mining (KDD), Beijing, 2012.
    DAGger supports k-medoids clustering of uncertain data represented using probabilistic c-tables, which allow for arbitrary correlations in the input data. It can compute exact and approximate probabilities with error guarantees for the clustering output.
  • FDB: A Query Engine for Factorised Relational Databases. [pdf, arxiv] Journal
    Nurzhan Bakibayev and Dan Olteanu and Jakub Závodný.
    In Very Large Data Bases (PVLDB), 5(11):1232-1243. Istanbul, 2012.
    FDB can evaluate select-project-join queries on relational data, where the input data and the intermediate/final results may be factorized. FDB can outperform relational engines by orders of magnitude due to its aggresive lossless compression of relational data (input and/or intermediate/final results).
  • Aggregates in Probabilistic Databases via Knowledge Compilation. [local, pdf, talk] Journal
    Robert Fink, Larisa Han, and Dan Olteanu.
    In Very Large Data Bases (PVLDB), 5(5):490-501. Istanbul, 2012.
    CoRR technical report abs/1201.6569.
    We study positive relational algebra queries with aggregates on a representation system for probabilistic data based on the algebraic structures of semiring and semimodule. The query evaluation technique compiles semimodule and semiring expressions into decomposition trees. We give syntactic characterisations of tractable queries with aggregates by exploiting the connection between query tractability and polynomial-time decomposition trees.
  • Factorised Representations of Query Results: Size Bounds and Readability. [pdf, talk, earlier arxiv] Conference
    Dan Olteanu and Jakub Závodný.
    In Int Conf on Database Theory (ICDT), Berlin, 2012.
    First paper on factorized databases, introducing characterizations of asymptotically optimal bounds for (i) the size of factorized results of conjunctive queries and (ii) the readability of provenance polynomials of conjunctive queries. Point (i) is strictly subsumed by the TODS'15 article listed above.
  • Ranking in Probabilistic Databases: Complexity and Efficient Algorithms. [pdf] Conference
    Dan Olteanu and Hongkai Wen.
    In IEEE Int Conf on Data Engineering (ICDE), Washington DC, 2012.
    Ranking is strictly easier than exact probabilistic inference for conjunctive queries in probabilistic databases: the latter may share computation across all query answers that is hard but irrelevant for ranking. We give a complexity dichotomy for ranking and a general effective approach for ranking query results.

2011

  • Probablistic Databases. [ web ] book
    Synthesis Lectures on Data Management, Morgan & Claypool, 2011, 180 pages.
    Dan Suciu, Dan Olteanu, Christopher Re, Christoph Koch.
    DOI: 10.2200/S00362ED1V01Y201105DTM016.
     
  • The Next Big Challenge in Data Management: Probabilistic Databases. [web] news
    Robert Fink and Dan Olteanu.
    Invited article in The Digit Magazine, June 2011 (10th Anniversary Special), in print.
    More about the Digit Magazine in Wikipedia.
  • Managing Probabilistic Data with SPROUT. [FULL EDITION] news
    Dan Olteanu.
    In Inspired Research (2nd edition), The Department of Computer Science, University of Oxford. June 2011.
  • On Factorisation of Provenance Polynomials. [pdf, poster] Workshop
    Dan Olteanu and Jakub Závodný.
    In 3rd USENIX Workshop on the Theory and Practice of Provenance (TaPP), June 2011, Heraklion, Crete.
     
  • Repeatability and Workability Evaluation of SIGMOD 2011. Journal
    Philippe Bonnet, Stefan Manegold, Matias Bjorling, Wei Cao, Javier Gonzales, Joel Granados, Nancy Hall, Stratos Idreos, Milena Ivanova, Ryan Johnson, David Koop, Tim Kraska, Rene Mueller, Dan Olteanu, Paolo Papotti, Christine Reilly, Dimitris Tsirogiannis, Cong Yu, Juliana Freire and Dennis Shasha.
    In SIGMOD Record, June 2011.
  • SPROUT^2: A Squared Query Engine for Uncertain Web Data. [pdf, poster, web] Demo
    Robert Fink, Andrew Hogue, Dan Olteanu, and Swaroop Rath.
    In ACM SIGMOD, Athens, June 2011.

    SPROUT² is a query answering system that allows users to ask structured queries over tables embedded in Web pages, over Google Fusion tables, over uncertain data extracted by NELL (Never-Ending Language Learning), and over uncertain tables that can be extracted from answers to Google Squared. At the core of this service lies SPROUT, our query engine for probabilistic databases. A more complete description of the demonstration is available here.
  • Providing Support for Full Relational Algebra in Probabilistic Databases. [pdf] Conference
    Robert Fink, Dan Olteanu, and Swaroop Rath.
    In IEEE Int Conf on Data Engineering (ICDE), Hannover, April 2011.

    We show how arbitrary relational algebra queries can be evaluated exactly or approximately in probabilistic databases using (partial) compilation of query lineage. We also conduct a tractability study for non-repeating queries with difference and for quantified queries (such as division and set inclusion/equivalence/incomparability). The case of non-repeating queries is developed in more depth in the PODS'14 paper.
  • On the Optimal Approximation of Queries using Tractable Propositional Languages. [pdf] Conference
    Robert Fink and Dan Olteanu.
    In Int. Conf. on Database Theory (ICDT), Uppsala, March 2011.

    We investigate approximations of non-repeating conjunctive queries in probabilistic databases by lower and upper bound queries that can be computed in polynomial time. The approach is based on finding model-based greatest lower bounds and least upper bounds for propositional formulas representing query lineage, where the bounds are in tractable languages such as (DNF) read-once formulas. All relevant results (and extensions) are in the VLDBJ'13 article.

2010

  • G-Store: A Storage Manager for Graph Data. [pdf] Tech Report
    Robin Steinhaus, Dan Olteanu, and Tim Furche.
    Technical report. Oct 2010.
    G-Store is a lightweight storage manager for graph data that exploits the graph structure for placing data on disk pages to maximize intra-page and minimize inter-page graph edges. This placement strategy can significantly speed up a wide range of access patterns found in graph queries.
  • Probabilistic XML via Markov Chains. [pdf] Journal
    Michael Benedikt, Evgeny Kharlamov, Dan Olteanu, and Pierre Senellart.
    In Very Large Data Bases (VLDB), Singapore, Sept 2010.

    We define a space of probabilistic XML models based on (restrictions of) Recursive Markov Chains and study their expressiveness, succinctness, and tractability of (MSO and XPath) query evaluation w/o unit-cost arithmetic. In contrast to existing probabilistic XML models, these new models (i) capture probabilistic versions of DTDs, (ii) can define infinite probability distributions, and (iii) can be exponentially more succinct, while still preserving MSO tractability.
  • Approximate Confidence Computation in Probabilistic Databases. [pdf, talk] Conference
    Dan Olteanu, Jiewen Huang, and Christoph Koch.
    In IEEE Int. Conf. on Data Engineering (ICDE), Long Beach, March 2010.

    We enhance our query engine SPROUT with a deterministic approximation algorithm with error guarantees for confidence computation in (arbitrarily correlated) probabilistic databases, which is based on incremental compilation of query lineage into so-called decomposition trees that support linear-time probability computation. The compilation is incremental and we show experimentally that it can achieve a given approximation within a few steps. In case of tractable (hierarchial or inequality) queries, it always finishes in polynomial time.
  • Bridging the Gap Between Intensional and Extensional Query Evaluation in Probabilistic Databases. [pdf] Conference
    Abhay Jha, Dan Olteanu, and Dan Suciu.
    In Extending Data Base Technology (EDBT), Lausanne, March 2010.

    We extend the notion of tractable queries on (arbitrarily correlated) probabilistic databases to tractable data-query instances, where general hard queries can become tractable on restricted data instances. The query evaluation is separated into two steps: (1) the (possibly large) tractable data-query instance is evaluated first by efficient database techniques, and then (2) the (usually small) intractable residue is processed using inference techniques based on treewidth.

2009

  • On Tractability of Inequality Queries over Probabilistic Databases and Counting Vertex Covers. [pdf] Tech Report
    Dan Olteanu and Rasmus Wissmann.
    Technical report, September 2009.
  • Secondary-Storage Confidence Computation for Conjunctive Queries with Inequalities. [pdf] Conference
    Dan Olteanu, Jiewen Huang.
    In ACM SIGMOD, Providence, June 2009.

    This paper is the first to discuss tractability of conjunctive queries with inequalities in probabilistic databases. It presents a class of hard queries, and gives a scalable algorithm for tractable inequality queries that is based on ordered binary decision diagrams and is implemented in our query engine SPROUT. Our implementation passed the SIGMOD'09 repeatability and workability evaluation (RWE). Out of 63 accepted papers, 19 participated in RWE, and 10 passed RWE.

    Resources:
    • Scripts for running the experiments.
    • Code of the SPROUT query engine used in the experiments.
  • MayBMS: A Probabilistic Database Management System. [pdf, video] Demo
    Jiewen Huang, Lyublena Antova, Christoph Koch, and Dan Olteanu.
    In ACM SIGMOD, Providence, June 2009.

    Demo of the latest version of MayBMS with full support for uncertainty-aware queries, succinct representation formalism for probabilistic data, and the new query engine SPROUT.

    Resources:
  • SPROUT: Lazy vs. Eager Query Plans for Tuple-Independent Probabilistic Databases. [pdf, talk] Conference
    Dan Olteanu, Jiewen Huang, Christoph Koch.
    In IEEE Int Conf on Data Engineering (ICDE), Shanghai, April 2009.

    We show that the tractable hierarchical queries on probabilistic databases admit relational query plans with unrestricted join ordering, thereby overcoming the limitations of the safe plans of earlier work. In this framework, safe plans can be seen as one specific type of eager plans (where confidence computation is done as soon as possible). We extend relational plans with a new aggregation operator that is optimized using query signature and schema information (keys) to minimize the number of scans it needs to compute tuple confidences. This operator basically factorizes the query lineage into read-once functions (called 1OF formulas in the paper) in polynomial time. An extensive study on TPC-H queries is available here.

    Resources:
  • 10^10^6 Worlds and Beyond: Efficient Representation and Processing of Incomplete Information. [pdf] Journal
    Lyublena Antova, Christoph Koch, Dan Olteanu.
    In the Special Issue of the VLDB Journal on Probabilistic Databases, 2009.
    Extended version of the ICDE 2007 paper.

    We discuss world-set decompositions, a complete representation formalism for uncertain and probabilistic data, and study query evaluation on these representations. World-set decompositions are based on factorizations to exploit independence and, at least in their probabilistic form, can be thought of as shallow Bayesian Networks.

2008

  • Using OBDDs for Efficient Query Evaluation on Probabilistic Databases. [pdf] Conference
    Dan Olteanu, Jiewen Huang.
    In 2nd Int. Conf. on Scalable Uncertainty Management (SUM), Napoli, Oct. 2008.

    This paper is the first to consider OBDDs for efficient evaluation of queries in probabilistic databases. It shows that for tractable hierarchical queries, the query lineage can be efficiently brought into one-occurrence form (that is, into an equivalent read-once function), where each variable occurs exactly once. Such factorizable lineage admit OBDDs of size linear in the number of their variables only. For hard queries, existing results that bound the OBDD size in the pathwidth of the lineage apply immediately.
  • Conditioning Probabilistic Databases. [arxiv, talk] Journal
    Christoph Koch, Dan Olteanu.
    In Very Large Data Bases (VLDB), volume 1, Auckland, Aug 2008.

    This paper is the first to consider the problem of conditioning a probabilistic database outside the context of graphical models. The core contribution is an exact confidence computation algorithm and its extension to achieve conditioning by database constraints.

    Resources:
    • CODE: Exact and approximate probability computation using world-set trees (now part of MayBMS).
  • Fast and Simple Relational Processing of Uncertain Data. [arxiv, talk] Conference
    Lyublena Antova, Thomas Jansen, Christoph Koch, Dan Olteanu.
    In IEEE Int. Conf. on Data Eng. (ICDE), Cancun, April 2008 (ranked 2nd in the research track).
    Also ACM CORR Report cs.DB/0707.1644.

    This paper presents U-relational databases, the representation system of MayBMS that is an extension of c-tables with probabilities and vertical partitioning. We also discuss the efficient, purely relational evaluation of relational algebra in such databases.

    Resources:
    • CODE: Extension of TPC-H population generator for attribute-level U-relational databases.
    • CODE: Queries and (U-relations to ULDBs, attribute-level to tuple-level) translators used in the experiments.
  • World-Set Decompositions: Expressiveness and Efficient Algorithms. [pdf] Journal
    Dan Olteanu, Christoph Koch, Lyublena Antova.
    In Elsevier Journal Theoretical Computer Science (TCS) , volume 403, Issues 2-3, Pages 133-416, 2008.
    Also ACM CORR Report cs.DB/0705.4442. Long version of the ICDT'07 paper.

    This paper studies the theory of world-set decompositions (that is, succinctness, expressiveness, and tractability of standard decision problems). Of particular interest is the factorization algorithm, which does a form of minimization of representations.
  • MayBMS: A System for Managing Large Uncertain and Probabilistic Databases. Workshop
    Lyublena Antova, Christoph Koch, Dan Olteanu.
    Best Poster Award at Spring'08 North East DB/IR Day, Columbia University, April 18, 2008.
  • A semi-empirical simulation of the extragalactic radio continuum sky. [arxiv] Journal
    R.J. Wilman, L. Miller, M.J. Jarvis, T. Mauch, F. Levrier, F.B. Abdalla, S. Rawlings, H.-R. Kloeckner, D. Obreschkow, D. Olteanu, S. Young.
    In Monthly Notices of the Royal Astronomical Society Main Journal, 2008.

2007

  • World-Set Decompositions: Expressiveness and Efficient Algorithms. [arxiv] Conference
    Lyublena Antova, Christoph Koch, Dan Olteanu.
    In Int. Conf. on Database Theory (ICDT), Barcelona, Jan. 2007.
    Long version appeared in TCS 2008 and is available as ACM CORR Report cs.DB/0705.4442.

    This paper studies the theory of world-set decompositions (that is, succinctness, expressiveness, and tractability of standard decision problems). Of particular interest is the factorization algorithm, which does a form of minimization of representations.
  • 10^10^6 Worlds and Beyond: Efficient Representation and Processing of Incomplete Information. [pdf] Conference
    Lyublena Antova, Christoph Koch, Dan Olteanu.
    In IEEE 23rd Int. Conf. on Data Eng. (ICDE), Istanbul, April 2007.
    Extended version in ACM CORR Report cd.DB/0606075.

    We discuss world-set decompositions, a complete representation formalism for uncertain data, and study query evaluation on these representations. World-set decompositions are based on factorizations to exploit independence.
  • From Complete to Incomplete Information and Back. [pdf] Conference
    Lyublena Antova, Christoph Koch, Dan Olteanu.
    In ACM SIGMOD, Beijing, June 2007.

    This paper presents the nonprobabilistic version of the MayBMS query language and studies some of its properties: genericity, expressiveness, conservativity over relational algebra, efficiency of evaluation, and algebraic equivalences.
  • Query language support for incomplete information in the MayBMS system. [pdf] Demo
    Lyublena Antova, Christoph Koch, Dan Olteanu.
    In Very Large Data Bases (VLDB), Vienna, Sept 2007.

    Demonstration of the first version of the MayBMS query language. It was much improved since then (check out the SIGMOD'09 demo).
  • MayBMS: Managing Incomplete Information with Probabilistic World-Set Decompositions. [pdf] Demo
    Lyublena Antova, Christoph Koch, Dan Olteanu.
    In IEEE 23rd Int. Conf. on Data Eng. (ICDE), Istanbul, April 2007.

    Demo of the first version MayBMS which used world-set decompositions as the representation formalism for uncertain data. In the mean time, MayBMS migrated to U-relational databases, a more succinct representation formalism (check out the SIGMOD'09 demo).
  • Forward Node-Selecting Queries Over Trees. [pdf] Journal
    Dan Olteanu.
    In ACM Trans. on Database Systems (TODS), vol. 32 no. 1, March 2007.

    This paper gives an in-depth theoretical study of the problem of rewriting XPath-like queries into equivalent queries without backwards steps. It introduces and studies two rewriting systems (confluence, termination, soundness and completeness, complexity, succinctness of rewritten over input queries) that can rewrite any "absolute" XPath-like navigational query into a backward-free equivalent.
  • SPEX: Streamed and Progressive Evaluation of XPath. [pdf] Journal
    Dan Olteanu.
    In IEEE Trans. Know. and Data Eng. (TKDE), vol. 19 no. 7, July 2007.

    This is the latest and (main) paper on the SPEX query engine, one of the first streaming XPath engines with polynomial combined complexity. SPEX compiles forward XPath queries (with support for all forward axes) into networks of transducers that process the XML stream tag after tag. For XPath with backwards axes, it uses a rewriting technique defined in earlier work of the author. SPEX features a distributed stack-like data structure that is shared by the incremental computations of the (possibly potential) answers, and a mechanism for reducing the stream traffic across transducers. It is publicly available at spex.sourceforge.net.

2006

  • Accelerating XPath Evaluation against XML Streams. Demo
    Dan Olteanu.
    In Programming Language Technologies for XML (PLAN-X), Charleston, South Carolina, January 2006.
  • Building a Native XML-DBMS as a Term Project in a Database Systems Course. [pdf] Workshop
    Christoph Koch, Dan Olteanu, Stefanie Scherzinger.
    In XQuery Implementations, Experience, and Perspectives (XIME-P), Chicago, June 2006.

2005

  • Evaluation of XPath Queries against XML streams. [pdf] Book
    Dan Olteanu.
    PhD Thesis, Institute for Informatics, Ludwig Maximilian University of Munich.
  • The XML Stream Query Processor SPEX. [pdf] Demo
    Francois Bry, Fatih Coskun, Serap Durmaz, Tim Furche, Dan Olteanu, Markus Spannagel.
    In IEEE 21st Int. Conf. on Data Eng. (ICDE), Tokyo, April 2005.

2004

  • An Efficient Single Pass Query Evaluator for XML Data Streams. [pdf] Conference
    Dan Olteanu, Tim Furche, Francois Bry.
    In ACM Symposium on Applied Computing (SAC) (Data Streams Track), Nicosia, March 2004.
  • Evaluating Complex Queries against XML Streams with Polynomial Combined Complexity. [extended] Conference
    Dan Olteanu, Tim Furche, Francois Bry.
    In British National Conf. on Databases (BNCOD21), Edinburgh, July 2004.
  • Aktuelles Schlagwort: Datenströme. [web] Journal
    Francois Bry, Tim Furche, Dan Olteanu.
    In Informatik Spektrum, 2/27, April 2004. Also in GI Informatiklexikon.

2003

  • An Evaluation of Regular Path Expressions with Qualifiers against XML Streams. [extended] Conference
    Dan Olteanu, Tobias Kiesling, Francois Bry.
    In IEEE 19th Int. Conf. on Data Eng. (ICDE), Bangalore, March 2003.

2002

  • XPath: Looking Forward. [pdf, extended] Workshop
    Dan Olteanu, Holger Meuss, Tim Furche, Francois Bry.
    In Workshop on XML-Based Data Management (XMLDM) at EDBT 2002, Prague, March 2002.
    A previous version of this article is entitled Symmetry in XPath.

    This paper shows that "absolute" XPath navigational queries can be rewritten into equivalent queries without backwards steps.

2001

  • Towards Grouping Constructs for Semistructured Data. [pdf, extended] Workshop
    Francois Bry, Dan Olteanu, Sebastian Schaffert.
    In Workshop on Electronic Bussiness Hubs (WebH) at DEXA 2001, Munich 2001.
  • Aktuelles Schlagwort: Semistrukturierte Daten. [web] Journal
    Francois Bry, Michael Kraus, Dan Olteanu, Sebastian Schaffert.
    In Informatik Spektrum, 4/24, August 2001. Also in GI Informatiklexikon.

2000

  • Agora - Living with XML and Relational. [pdf] Demo
    Ioana Manolescu, Daniela Florescu, Donald Kossmann, Dan Olteanu, Florian Xhumari.
    In Very Large Data Bases (VLDB), Cairo, 2000.
  • Answering Queries using Views in Agora. Book
    Dan Olteanu.
    Diploma Thesis. Work completed at Politehnica University of Bucharest and INRIA Rocquencourt. 2000.