PhD Thesis Defence 29.05.2015

Time series data is prominent in many real world applications, e.g., hydrology or finance stock market. In many of these applications, time series data is missing in blocks, i.e., multiple consecutive missing values. For example, in hydrology field around 20% of the data is missing in blocks. However, many time series analysis tasks such as prediction require the existence of the complete data. The recovery of blocks of missing values in time series is challenging in the case where the missing block is a peak or a valley. The problem is more challenging in real world time series because of the irregularity in the data. The state-of-the-art recovery techniques are suitable to apply either for the recovery of single missing values or for the recovery of blocks of missing values in regular time series. The goal of this thesis is to propose an accurate recovery of blocks of missing values in irregular time series. The recovery solution we propose is based on matrix decomposition techniques. The main idea of the recovery is to represent correlated time series as columns of an input matrix where missing values have been initialized and iteratively apply matrix decomposition technique to refine the initialized missing values. A key property of our recovery solution is that it learns from the history of the time series that contains the missing blocks and the history of its correlated time series the shape, the width and the amplitude of the missing blocks. Our experiments on real world hydrological time series show that our approach outperforms the state-of-the-art recovery techniques for the recovery of missing blocks in irregular time series. The recovery solution is implemented as a graphical tool that displays, browses and accurately recovers missing blocks in irregular time series. The proposed approach supports learning from highly and lowly correlated time series. This is important since lowly correlated time series, e.g., shifted time series, that exhibit shape and/or trend similarities are beneficial to include in the recovery process. We reduce the space complexity of the proposed solution from quadratic to linear. This allows to use time series with long length history without prior segmentation. We prove the scalability and the correctness of the solution.

Where and When: May, 29 15:45-16:45; Room 1.D.06.