Cardinality Estimation with Guarantees
Cardinality estimation is a longstanding and fundamental problem in databases, dubbed the Achilles' heel of query optimizers: Given a query, the goal is to estimate the size of its output without computing the query. Query optimizers rely on these estimates to decide how to execute a query. Inaccurate estimates can lead to poor query plans and significant performance penalties. Traditional cardinality estimators used in open-source and commercial database systems routinely overestimate or underestimate the true output size, since they make various unrealistic assumptions about data. Learned estimators can perform very well when tuned for a specific dataset, yet they come with large models and training time and they are not explainable nor robust. Over the past decade, there has been significant theoretical progress on cardinality estimation, yet the proposed upper bounds remain too loose and ineffective in practice.
LpBound [demo and 5-min video] is a new approach to cardinality estimation that significantly furthers the theory while also being practical. LpBound translates the cardinality estimation problem into a linear optimization problem, whose objective is the maximization of the joint entropy of the variables in the query and whose constraints are the basic Shannon inequalities and new inequalities that upper-bound terms involving conditional entropies on the join variables by ℓp-norms on degree sequences of the corresponding join columns.
Check out our 20-minute video for an overview of LpBound!
Talks
- Can Information-Theoretic Guaranteed Cardinality Upper Bounds lead the Way?[slides (PDF, 364 KB)]
ACM SIGMOD/PODS 2025 special event on Query Optimization Unleashed — Bridging Theory, AI, and Systems for the Future of Databases, June 2025
- Information Theory Strikes Back: New Development in the Theory of Cardinality Estimation
Logic and Algorithms in Database Theory and AI Reunion, Simons Institute for the Theory of Computing, UC Berkeley, Jan 2025
- Cardinality Estimation using Lp-norms on Degree Sequences
Seminar @ MIT CSAIL, November 2024
- Cardinality Estimation using Lp-norms on Degree Sequences
LFCS Seminar at University of Edinburgh, November 2024
- Cardinality Estimation using Lp-norms on Degree Sequences
Microsoft Gray Systems Lab, September 2024
Publications
- LpBound: Pessimistic Cardinality Estimation Using Lp-Norms of Degree Sequences. [arxiv] Journal
Haozhe Zhang, Christoph Mayer, Mahmoud Abo Khamis, Dan Olteanu, Dan Suciu
To appear in ACM SIGMOD (PACMMOD Journal), June 2025.
Best Paper Award
- Information Theory Strikes Back: New Development in the Theory of Cardinality Estimation. [arxiv] Journal
Mahmoud Abo Khamis, Vasileios Nakos, Dan Olteanu, Dan Suciu.
To appear in ACM SIGMOD Record, March 2025.
SIGMOD Research Highlights Award
- LpBound in Action: Cardinality Estimation with One-Sided Guarantees. [Demo; 5-min video]
Haozhe Zhang, Christoph Mayer, Mahmoud Abo Khamis, Dan Olteanu, Dan Suciu
To appear in ACM SIGMOD (PACMMOD Journal), June 2025. Demonstration.
- Pessimistic Cardinality Estimation. [arxiv ] Journal
Mahmoud Abo Khamis, Kyle Deeds, Dan Olteanu, Dan Suciu.
In SIGMOD Record, Database principles column, December 2024.
- Join Size Bounds using Lp-Norms on Degree Sequences. [ arxiv] Journal
Mahmoud Abo Khamis, Vasileios Nakos, Dan Olteanu, Dan Suciu.
In PODS (PACMMOD Journal), Distinguished Paper, June 2024.
Team
- Mahmoud Abo Khamis (RelationalAI)
- Kyle Deeds (University of Washington)
- Christoph Mayer (University of Zurich)
- Vasileios Nakos (National and Kapodistrian University of Athens)
- Dan Olteanu (University of Zurich)
- Dan Suciu (University of Washington)
- Haozhe Zhang (University of Zurich)