Seminar Database Systems (PhD, MSc, BSc)

Organization:

Michael Böhlen,  Sven Helmer and  Paolo Penna
Teaching language: English
Level: Advanced BSc, MSc and PhD students
Academic Year: Spring 2019
Dates:

Tuesday 19.2.2019, 16.30 - 18.00, UZH, BIN 0.K.11/12/13
Saturday 13.4.2019, 9.00 - 15.00, UZH, BIN 2.A.01
Saturday 11.5.2019, 9.00 - 15.00, ETH, CAB H.52 ()


Overview and objectives: The area of this year's seminar is Algorithms and Systems for Data Science. Students learn how to critically read and study research papers, how to summarize the contents of a paper, and how to present it in a seminar.

Teaching format: Each participant writes a self-contained report of about 10 pages and gives a 30 minutes presentation (blackboard, without a computer). Each participant has a buddy. Buddies read the report, make suggestions for improvements, and help with the presentation (e.g., dry runs). The first version of the report is due three weeks before the date of the presentation. This first version of the report and presentation will be discussed with the buddy and the teacher about two weeks before the presentation. The final versions of the report are due one week before the presentation.

Setup and Organization: The setup of the seminar will be discussed Tuesday, February 19, 2019 from 16:30 until 18:00 in room (tba) at UZH. At the first meeting the available slots for the seminar will be distributed and papers will be assigned.

Presentations:

  • Saturday April 13, BIN 2.A.01
  • ​Saturday May 11, CAB H.52 (we meet in front of the south entry (the one closest to the tram stop ETH/Universitätspital) at 8:50)

Participation at all three meetings is compulsory. The assessment depends on the quality of the report, presentation, active participation during the seminar, and input as a buddy.

Useful links:


Topics

1. Architectures and Systems

2. Column Stores

3. Streams

4. Spark

5. Query Processing

6. Clustering

 

Saturday, April 13, 2019

topic Presenter Buddy Advisor

MISO: Souping up Big Data Query Processing with a Multistore System , SIGMOD 2014. (PDF, 2213 KB)

Pascal Engeli Michael Studer Sven Helmer

RHEEM: Enabling Cross-Platform Data Processing, PVLDB 2018. (PDF, 1874 KB)

Mesut Ceylan Alex Wolf Sven Helmer

Abstraction for Advanced In-Database Analytics, PVLDB 2018. (PDF, 554 KB)

Sara Decova Maximilian Wolfertz Michael Böhlen

Column Sketches: A Scan Accelerator for Rapid and Robust Predicate Evaluation, SIGMOD 2018. (PDF, 1430 KB)

Catharina Dekker Clive Charles Javara Paolo Penna

Column-Stores vs. Row-Stores: How Different Are TheyReally?, SIGMOD 2008. (PDF, 789 KB)

Peter Giger Han-Mi Nguyen Sven Helmer

Access Path Selection in Main-Memory Optimized Data Systems: Should I Scan or Should I Probe?, SIGMOD 2017. (PDF, 683 KB)

Mike Suter Luca Wolf Michael Böhlen

Incremental Query Processing on Big Data Streams, TKDE 2016. (PDF, 433 KB)

Lorenzo Selvatici Timon Stampfli Paolo Penna

The Stratosphere Platform for Big Data Analytics , VLDB Journal 2014. (PDF, 2233 KB)

Syed Shahvaiz Ahmed Donn Edward Anin Paolo Penna

Drizzle: Fast and Adaptable Stream Processing at Scale, SOSP 2017. (PDF, 767 KB)

Yichun Xie Emilien Pierre Carlo Pilloud Michael Böhlen

Saturday, May 11, 2019

topic Presenter Buddy Advisor

Spark SQL: Relational Data Processing in Spark , SIGMOD 2015. (PDF, 984 KB)

Luca Wolf Sara Decova Sven Helmer

SHC: Distributed Query Processing for Non-Relational Data Store, ICDE 2018. (PDF, 892 KB)

Donn Edward Anin Lorenzo Selvatici Sven Helmer

Flare: Optimizing Apache Spark with Native Compilation for Scale-Up Architectures and Medium-Size Data, OSDI 2018. (PDF, 711 KB)

Clive Charles Javara Syed Shahvaiz Ahmed Sven Helmer

A Minimal Variance Estimator for the Cardinality of Big Data Set Intersection . KDD 2017. (PDF, 720 KB)

Emilien Pierre Carlo Pilloud Mesut Ceylan Paolo Penna

Orca: A Modular Query Optimizer Architecture for Big Data, SIGMOD 2014. (PDF, 1320 KB)

Maximilian Wolfertz Mike Suter Michael Böhlen

Optimizing Big Data Queries Using Program Synthesis, SOSP 2017. (PDF, 1018 KB)

Alex Wolf Yichun Xie Michael Böhlen

Clustering with Same-Cluster Queries. NIPS 2016. (PDF, 274 KB)

Michael Studer Peter Giger Paolo Penna

A Hierarchical Algorithm for Extreme Clustering. KDD 2017. (PDF, 1498 KB)

Han-Mi Nguyen Pascal Engeli Paolo Penna

Coconut: A Scalable Bottom-Up Approach for Building Data Series Indexes . VLDB 2018. (PDF, 1849 KB)

Timon Stampfli Catharina Dekker Michael Böhlen