Seminar Database Systems (PhD, MSc, BSc)

Organization: Michael Böhlen and Przemyslaw Uznanski
Teaching language: English
Level: PhD, MSc and advanced BSc students
Academic Year: Spring 2018

Friday 23.2.2018, 14.15-16.00h   
Saturday 28.4.2018, 8.15 - 16.00h 
Saturday 19.5.2018, 8.15 - 16.00h

Overview and objectives: The area of this year's seminar is Locality-Sensitive Hashing, Similarity, and Nearest Neighbours. Students learn how to critically read and study research papers, how to summarize the contents of a paper, and how to present it in a seminar.

Teaching format: Each participant writes a self-contained report of about 10 pages and gives a 30 minutes presentation (blackboard, without a computer). Each participant has a buddy. Buddies read the report, make suggestions for improvements, and help with the presentation (e.g., dry runs). The first version of the report is due two weeks before the date of the presentation, and will be discussed with the buddy and the professor about one week before the presentation. The final versions of the report are due on the day of the seminar.

Setup and Organization: The setup of the seminar will be discussed Friday February 23, 2018 from 14:15 until 15:00 in room CAB H 52 at ETHZ. At the first meeting the available slots for the seminar will be distributed and papers will be assigned. Here are the slides of the first meeting (PDF, 64 KB). How to give talks and read papers: link. The presentations take place during Saturday April 28 and Saturday May 19 (all day). Participation at all three meetings is compulsory. The assessment depends on the quality of the report, presentation, active participation during the seminar, and input as a buddy.

Tentative list of seminar papers:

topic presenter buddy professor
(1) Time Adaptive Sketches (Ada-Sketches) for Summarizing Data Streams (PDF, 531 KB)      
(2) Hokusai — Sketching Streams in Real Time (PDF, 3359 KB)      
(3) An Improved Data Stream Summary: The Count-Min Sketch and its Applications (PDF, 143 KB)      
(6) Sketching and Embedding are Equivalent for Norm (PDF, 299 KB)      
(7) Overcoming the l1 Non-Embeddability Barrier: Algorithms for Product Metrics (PDF, 312 KB)      
(8) Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Subconstant Error (PDF, 165 KB)      
(9) Earth Mover’s Distance based Similarity Search at Scale (PDF, 1012 KB)      
(10) Practical and Optimal LSH for Angular Distance (PDF, 358 KB)      
(11) Dimensionality Reduction Techniques for Proximity Problems (PDF, 214 KB)      
(12) Comparing Data Streams Using Hamming Norms (How to Zero In) (PDF, 148 KB)      
(13) Similarity Search in High Dimensions via Hashing (PDF, 303 KB)      
(14) Locality-Sensitive Hashing Scheme Based on p-Stable Distributions (PDF, 205 KB)      
(15) Set similarity search beyond MinHash (PDF, 1169 KB)      
(16) Syntactic clustering of the Web (PDF, 213 KB)      
(17) Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions (PDF, 171 KB)      
(18) Approximate nearest neighbors: towards removing the curse of dimensionality (PDF, 1298 KB)      
(19) An Optimal Algorithm for the Distinct Elements Problem (PDF, 331 KB)      
(20) Algorithms for lp Low-Rank Approximation (PDF, 2174 KB)