Seminar Database Systems (PhD, MSc, BSc)
Organization: |
Michael Böhlen, Sven Helmer and Paolo Penna |
Teaching language: | English |
Level: | Advanced BSc, MSc and PhD students |
Academic Year: | Spring 2019 |
Dates: | Tuesday 19.2.2019, 16.30 - 18.00, UZH, BIN 0.K.11/12/13 |
Overview and objectives: The area of this year's seminar is Algorithms and Systems for Data Science. Students learn how to critically read and study research papers, how to summarize the contents of a paper, and how to present it in a seminar.
Teaching format: Each participant writes a self-contained report of about 10 pages and gives a 30 minutes presentation (blackboard, without a computer). Each participant has a buddy. Buddies read the report, make suggestions for improvements, and help with the presentation (e.g., dry runs). The first version of the report is due three weeks before the date of the presentation. This first version of the report and presentation will be discussed with the buddy and the teacher about two weeks before the presentation. The final versions of the report are due one week before the presentation.
Setup and Organization: The setup of the seminar will be discussed Tuesday, February 19, 2019 from 16:30 until 18:00 in room (tba) at UZH. At the first meeting the available slots for the seminar will be distributed and papers will be assigned.
Presentations:
- Saturday April 13, BIN 2.A.01
- Saturday May 11, CAB H.52 (we meet in front of the south entry (the one closest to the tram stop ETH/Universitätspital) at 8:50)
Participation at all three meetings is compulsory. The assessment depends on the quality of the report, presentation, active participation during the seminar, and input as a buddy.
Useful links:
- organizational slides (PDF, 64 KB)
- How to give talks and read papers: link
Topics
1. Architectures and Systems
- MISO: Souping up Big Data Query Processing with a Multistore System , SIGMOD 2014. (PDF, 2 MB)
- RHEEM: Enabling Cross-Platform Data Processing, PVLDB 2018. (PDF, 1 MB)
- Abstraction for Advanced In-Database Analytics, PVLDB 2018. (PDF, 554 KB)
2. Column Stores
- Column Sketches: A Scan Accelerator for Rapid and Robust Predicate Evaluation, SIGMOD 2018. (PDF, 1 MB)
- Column-Stores vs. Row-Stores: How Different Are TheyReally?, SIGMOD 2008. (PDF, 789 KB)
- Access Path Selection in Main-Memory Optimized Data Systems: Should I Scan or Should I Probe?, SIGMOD 2017. (PDF, 683 KB)
3. Streams
- Incremental Query Processing on Big Data Streams, TKDE 2016. (PDF, 433 KB)
- The Stratosphere Platform for Big Data Analytics , VLDB Journal 2014. (PDF, 2 MB)
- Drizzle: Fast and Adaptable Stream Processing at Scale, SOSP 2017. (PDF, 767 KB)
4. Spark
- Spark SQL: Relational Data Processing in Spark , SIGMOD 2015. (PDF, 984 KB)
- SHC: Distributed Query Processing for Non-Relational Data Store, ICDE 2018. (PDF, 892 KB)
- Flare: Optimizing Apache Spark with Native Compilation for Scale-Up Architectures and Medium-Size Data, OSDI 2018. (PDF, 711 KB)
5. Query Processing
- A Minimal Variance Estimator for the Cardinality of Big Data Set Intersection . KDD 2017. (PDF, 720 KB)
- Orca: A Modular Query Optimizer Architecture for Big Data, SIGMOD 2014. (PDF, 1 MB)
- Optimizing Big Data Queries Using Program Synthesis, SOSP 2017. (PDF, 1018 KB)
6. Clustering
- Clustering with Same-Cluster Queries. NIPS 2016. (PDF, 274 KB)
- A Hierarchical Algorithm for Extreme Clustering. KDD 2017. (PDF, 1 MB)
- Coconut: A Scalable Bottom-Up Approach for Building Data Series Indexes . VLDB 2018. (PDF, 1 MB)
Saturday, April 13, 2019
topic | Presenter | Buddy | Advisor |
---|---|---|---|
MISO: Souping up Big Data Query Processing with a Multistore System , SIGMOD 2014. (PDF, 2 MB) |
Pascal Engeli | Michael Studer | Sven Helmer |
RHEEM: Enabling Cross-Platform Data Processing, PVLDB 2018. (PDF, 1 MB) |
Mesut Ceylan | Alex Wolf | Sven Helmer |
Abstraction for Advanced In-Database Analytics, PVLDB 2018. (PDF, 554 KB) |
Sara Decova | Maximilian Wolfertz | Michael Böhlen |
Catharina Dekker | Clive Charles Javara | Paolo Penna | |
Column-Stores vs. Row-Stores: How Different Are TheyReally?, SIGMOD 2008. (PDF, 789 KB) |
Peter Giger | Han-Mi Nguyen | Sven Helmer |
Mike Suter | Luca Wolf | Michael Böhlen | |
Incremental Query Processing on Big Data Streams, TKDE 2016. (PDF, 433 KB) |
Lorenzo Selvatici | Timon Stampfli | Paolo Penna |
The Stratosphere Platform for Big Data Analytics , VLDB Journal 2014. (PDF, 2 MB) |
Syed Shahvaiz Ahmed | Donn Edward Anin | Paolo Penna |
Drizzle: Fast and Adaptable Stream Processing at Scale, SOSP 2017. (PDF, 767 KB) |
Yichun Xie | Emilien Pierre Carlo Pilloud | Michael Böhlen |
Saturday, May 11, 2019
topic | Presenter | Buddy | Advisor |
---|---|---|---|
Spark SQL: Relational Data Processing in Spark , SIGMOD 2015. (PDF, 984 KB) |
Luca Wolf | Sara Decova | Sven Helmer |
SHC: Distributed Query Processing for Non-Relational Data Store, ICDE 2018. (PDF, 892 KB) |
Donn Edward Anin | Lorenzo Selvatici | Sven Helmer |
Clive Charles Javara | Syed Shahvaiz Ahmed | Sven Helmer | |
Emilien Pierre Carlo Pilloud | Mesut Ceylan | Paolo Penna | |
Orca: A Modular Query Optimizer Architecture for Big Data, SIGMOD 2014. (PDF, 1 MB) |
Maximilian Wolfertz | Mike Suter | Michael Böhlen |
Optimizing Big Data Queries Using Program Synthesis, SOSP 2017. (PDF, 1018 KB) |
Alex Wolf | Yichun Xie | Michael Böhlen |
Clustering with Same-Cluster Queries. NIPS 2016. (PDF, 274 KB) |
Michael Studer | Peter Giger | Paolo Penna |
A Hierarchical Algorithm for Extreme Clustering. KDD 2017. (PDF, 1 MB) |
Han-Mi Nguyen | Pascal Engeli | Paolo Penna |
Coconut: A Scalable Bottom-Up Approach for Building Data Series Indexes . VLDB 2018. (PDF, 1 MB) |
Timon Stampfli | Catharina Dekker | Michael Böhlen |