Modern Data Analytics 2025

Organization: Prof. Dr. Dan Olteanu, Dr. Andrei Draghici, Dr. Haozhe Zhang, Christoph Mayer, Eden Chmielewski, and Yuchen He.

This seminar provides a deep dive into the recent research developments reshaping the core of modern database systems: query processing and optimization. The performance of virtually every data-driven application hinges on the database's ability to translate declarative queries into efficient, low-level execution plans. However, the sheer complexity of modern analytics, the demand for real-time results, and the scale of today's datasets are pushing classical, heuristic-based optimizers to their breaking point.

Learning outcome: The goal of the seminar is to expose the students to the recent trends in academia and industry on rethinking modern data analytics systems. The students will read and present research published in the top international venues in data management research, in particular ACM Special Interest Group on Management of Data (SIGMOD) and Very Large Data Bases (VLDB). Students will gain a deep understanding of the challenges and state-of-the-art solutions in query optimization, robust execution, and real-time analytics maintenance. The course will equip them to critically analyze and contribute to the development of next-generation, high-performance data systems.

Target audience: MSc in Software Engineering, Data Science and AI students.

Semester: This seminar will be offered in Fall 2025.

Teaching format: Each participant prepares a presentation based on a research paper; answers follow-up technical questions; reads the other papers in the seminar session; and actively participates in the technical discussions in the seminar. Each participant has a buddy, who will help improve their presentation by making suggestions for improvements and attending dry runs of the presentation. The best presentation of the seminar will be selected by the participants and receive a prize.

Registration: Please register as required by the department. In addition, please browse the papers mentioned below. In the kickoff meeting, the papers will be assigned to students, so make sure you get assigned to a paper you want.

Meetings: The first meeting will be on Thursday, September 18, 2025 from 10:15 to 12:00 in room BIN 1.D.29. The meeting will feature a presentation by the organizers overviewing the topics to be investigated in the seminar and it will answer questions from the participants. In this session, students will be assigned to papers.

The student presentations will take place on Saturday November 8 and November 22, 2025 in BIN 2.A.01.

Participation at all three meetings is compulsory. The assessment depends on the quality of the presentation, active participation during the seminar, and input as a buddy.

How to read papers and give talks

How to read papers:

Focus questions to help identify the main contributions of a paper
Survival kit includes tips on how to read technical sections and the "three-pass approach" to tie all together
Reading Research Papers by Andrew Ng

How to give talks:

These two articles have a number of good suggestions.
This video is pretty good as well.
How To Speak by Patrick Winston - a newer version of Patrick's talk

Slides from the kick-off meeting

Here are the slides from the kick-off meeting Introduction slides. Bellow you can find the assignments for the two presentation days. If you want to get in contact with your supervisor, here is our list of emails:

Presentations for November 8th

How Good are Query Optimizers, Really? Still Asking: How Good Are Query Optimizers, Really?
- Presented by: Müge Yegin
- Buddy: Xinyao Cao
- Supervisor: Christoph Mayer
SQLStorm: Taking Database Benchmarking into the LLM Era
- Presented by: Xinyao Cao
- Buddy: Noah Croes
- Supervisor: Dan Olteanu
How Good are Learned Cost Models, Really? Insights from Query Optimization Tasks
- Presented by: Noah Cores
- Buddy: Müge Yegin
- Supervisor: Andrei Draghici
SafeBound: A Practical System for Generating Cardinality Bounds
- Presented by: Michael Sigg
- Buddy: Sofoklis Strompolas
- Supervisor: Yuchen He
Analyzing the Impact of Cardinality Estimation on Execution Plans in Microsoft SQL Server
- Presented by: Sofoklis Strompolas
- Buddy: Michael Sigg
- Supervisor: Eden Chmielewski
DPconv: Super-Polynomially Faster Join Ordering
- Presented by: Lihui Zhou
- Buddy: Annamaria Vass
- Supervisor: Yuchen He
How to Optimize SQL Queries? A Comparison Between Split, Holistic, and Hybrid Approaches
- Presented by: Annamaria Vass
- Buddy: Lihui Zhou
- Supervisor: Eden Chmielewski

Presentations for Saturday 22snd

Robust Join Processing with Diamond Hardened Joins
- Presented by: Marcelina Suszczyk
- Buddy: Elif Deniz İșbuğa
- Supervisor: Yuchen He
SkinnerDB: Regret-Bounded Query Evaluation via Reinforcement Learning
- Presented by: Philipp Stoffel
- Buddy: Uros Dimitrijevic
- Supervisor: Christoph Mayer
ADOPT: Adaptively Optimizing Attribute Orders for Worst-Case Optimal Join Algorithms via Reinforcement Learning
- Presented by: Uros Dimitrijevic
- Buddy: Nishant Kumar
- Supervisor: Christoph Mayer
Holistic query Approximation via RL Modeling
- Presented by: Nishant Kumar
- Buddy: Philipp Stoffel
- Supervisor: Andrei Draghici
Query running too slow? Rewrite it with Quorion!
- Presented by: Elif Deniz İșbuğa
- Buddy: Marcelina Suszczyk
- Supervisor: Eden Chmielewski
Streaming View: An Efficient Data Processing Engine for Modern Real-time Data Warehouse of Alibaba Cloud
- Presented by: Birghton Thomas
- Buddy: Akos Istvan Imets
- Supervisor: Haozhe Zhang
Streaming Democratized: Ease Across the Latency Spectrum with Delayed View Semantics and Snowflake Dynamic Tables
- Presented by: Akos Istvan Imets
- Buddy: Lin Han
- Supervisor: Dan Olteanu
Automated generation of materialized views in oracle
- Presented by: Lin Han
- Buddy: Brighton Thomas
- Supervisor: Haozhe Zhang

The following papers are left here to provide a broader context

Topic 1: Benchmarks for Query Optimization

Topic 2: Cardinality Estimation

For everyone to read: Pessimistic Cardinality Estimation

Topic 3: Query Optimization

Topic 4: Factorized Query Processing

FDB: a query engine for factorised relational databases Graphflow: An Active Graph Database. (Paper, Blog Post)
The ubiquity of large graphs and surprising challenges of graph processing: extended survey
Robust Join Processing with Diamond Hardened Joins
Adaptive factorization using linear-chained hash tables

Topic 5: Query Processing using Reinforcement Learning

Topic 6: Robust Query Processing

Topic 7: Incremental View Maintenance

For everyone to read: Recent Increments in Incremental View Maintenance

Quicklinks

Main navigation

Modern Data Analytics 2025

How to read papers and give talks

Slides from the kick-off meeting

Presentations for November 8th

Presentations for Saturday 22snd

The following papers are left here to provide a broader context

Topic 1: Benchmarks for Query Optimization

Further Reading:

Topic 2: Cardinality Estimation

Further Reading:

Topic 3: Query Optimization

Further Reading:

Topic 4: Factorized Query Processing

Further Reading:

Topic 5: Query Processing using Reinforcement Learning

Topic 6: Robust Query Processing

Topic 7: Incremental View Maintenance

Further reading: