Modern Data Analytics 2021
Organization: Prof. Dr. Dan Olteanu, Prof. Dr. Michael Böhlen
This seminar overviews recent research development at the intersection of databases and machine learning. In particular, it considers two distinct lines of work:
- The application of machine learning to databases: Use models to predict query performance or replace traditional modules in a database management system such as indices.
- The application of databases to machine learning: Use database techniques to improve the runtime performance for training machine learning models.
Learning outcome: The goal of the seminar is to expose the students to the recent trends in academia and industry on rethinking database management systems and on how to effectively unify knowledge on both machine learning and databases to scale data science workloads.
Target audience: MSc in Data Science students (the maximum number of students is restricted to 18)
Semester: This seminar will be offered in Fall 2021.
Teaching format: Each participant writes a self-contained report of about 10 pages, gives a 30 minutes presentation, and answers follow-up technical questions in a 15-minutes Q&A session. Each participant has a buddy. Buddies read the report, make suggestions for improvements, and help with the presentation (e.g., dry runs). The first version of the report is due four weeks before the date of the presentation. Please use this template for the report. This first version of the report will then be discussed with the buddy and the teacher. The final version of the report is due on the day of the presentation.
Registration: Please register as required by the department. In addition, please browse the papers mentioned below. In the kickoff meeting, the papers will be assigned to students, so make sure you get assigned to a paper you want.
Meetings: The first meeting will be on Tuesday, September 21, 2021 from 2pm to 4pm in room BIN 1.D.29. The meeting will feature a presentation by the organizers overviewing the topics to be investigated in the seminar and it will answer questions from the participants. In this session, students will be assigned to papers. The slides of the first meeting can be found here.
The student presentations will take place on Saturdays November 27 and December 4, 2021 in Room BIN 2.A.10.
Participation at all three meetings is compulsory. The assessment depends on the quality of the report, presentation, active participation during the seminar, and input as a buddy.
How to read papers and give talks
How to read papers:
- Focus questions to help identify the main contributions of a paper
- Survival kit includes tips on how to read technical sections and the "three-pass approach" to tie all together
- Reading Research Papers by Andrew Ng
How to give talks:
- These two articles have a number of good suggestions.
- This video is pretty good as well.
- How To Speak by Patrick Winston - a newer version of Patrick's talk
Papers to be read by all students
The following are individual paper assignments organized by topics. Whenever an entry has two papers, this means that both papers can be presented together (as they use similar ideas), or only one of them can be presented.
Topic 1: Learned Data Structures used in Database Systems
Topic 2: Learned Query Optimization and Evaluation
- 2.1 How Good are Query Optimizers, Really?
- 2.2 Learning to Optimize Join Queries With Deep Reinforcement Learning
- 2.3 Bao: Making Learned Query Optimization Practical
- 2.4 Learned Cardinalities: Estimating Correlated Joins with Deep Learning
- 2.5 Are We Ready For Learned Cardinality Estimation?
- 2.6SkinnerDB: Regret-Bounded Query Evaluation via Reinforcement Learning
- 2.7 Scalable Multi-Query Execution using Reinforcement Learning
Topic 3: In-database Machine Learning and Linear Algebra
- 3.1 The MADlib Analytics Library or MAD Skills, the SQL
- 3.2 Learning Linear Regression Models over Factorized Joins
- 3.3 A Layered Aggregate Engine for Analytics Workloads
- 3.4 Rk-means: Fast Clustering for Relational Data
- 3.5 LaraDB: A Minimalist Kernel for Linear and Relational Algebra Computation
- 3.6 Compressed linear algebra for declarative large-scale machine learning
- 3.7 Incremental and Approximate Inference for Faster Occlusion-based Deep CNN Explanations
Paper Assignments, Buddies, and Supervisors
Paper | Name | Buddy | Supervisor |
---|---|---|---|
1.1 |
Manuel Beyeler |
Anh Pham | Ahmet Kara |
1.2 |
Anh Pham |
Manuel Beyeler | Ahmet Kara |
1.3 |
Lukas Vollenweider |
Daniel Demeter | Ahmet Kara |
2.1 |
Sukriti Sinha |
Tijana Kostovic | Michael Böhlen |
2.2 |
Kexin Shi |
Jacob Gelling | Ahmet Kara |
2.3 |
Tianshuai Lu |
Kartikey Sharma | Michael Böhlen |
2.4 |
Kartikey Sharma |
Tianshuai Lu | Michael Böhlen |
2.5 |
Yuhan Lin |
Minghao Li | Michael Böhlen |
2.6 |
Georgios Anagnostou |
Linus Stach | Dan Olteanu |
3.1 |
Linus Stach |
Georgios Anagnostou | Dan Olteanu |
3.2 |
Minghao Li |
Yuhan Lin | Dan Olteanu |
3.3 |
Jacob Gelling |
Kexin Shi | Nils Vortmeier |
3.4 |
Peilin He |
Yilan wu | Nils Vortmeier |
3.5 |
Tijana Kostovic |
Sukriti Sinha | Nils Vortmeier |
3.6 |
Yilan wu |
Peilin He | Nils Vortmeier |
3.7 | Daniel Demeter | Lukas Vollenweider | Nils Vortmeier |
Papers 1.1 to 2.5 will be presented on Saturday, November 27, 2021
Papers 2.6 to 3.7 will be presented on Saturday, December 4, 2021