Recovery of Missing Values in Time Series
Time series data arise in a variety of domains, such as environmental, telecommunication, financial, and medical data. For example, in the field of hydrology, sensors are used to capture environmental phenomena including temperature, air pressure, and humidity at different points in time. For such data, it is not uncommon that more than 20% of the data is missing as blocks, i.e., multiple consecutive measurements are missing.
This project is divided into two main parts. In the first part of this project, we propose a new technique which accurately approximates the blocks of missing values in large number of input time series of millions of observations. Using a matrix decomposition method, the technique should be able to restore the main missing trends, i.e., peaks and valleys.
The second part of the project deals with the efficiency of the proposed technique. The time execution and the memory I/O of the algorithm should be optimal. For this purpose, two directions will be investigated. In the first direction, we will propose a better approximation of the matrix decomposition method used for the recovery of missing values. In the second direction, we will propose an SQL-based implementation of the matrix decomposition method in the aim to get rid of the storage of the matrices (used during the decomposition) from the main memory.
An empirical and analytical comparison with the existing methods will be described in order to verify the implementations.