Consider the supermarket application of Figure 1. The supermarket records data related to purchases of clients (a), online shopping carts (b), and inventory (c). At each time point (e.g., a day), the supermarket aims at predicting the products that clients want to buy or order versus those that it has in stock. For example, tuple ('milk', a1, [2,10), 0.3) captures that, at each day from the 2nd to the 10th of the month, “milk is bought" with probability 0.3. There is no other tuple in a that predicts the probability of buying 'milk' over an interval overlapping with [2,10). Assume the supermarket wants to determine, at each time point, the probability that a product is in stock but no client wants to order or buy this product. The corresponding query is Q = c-Tp (a ∪Tp b), i.e., the union of relations a and b, followed by a difference with relation c (see Fig. 1b). Answer tuple ('milk', c1 ∧¬a1, [2,4), 0.42) (see Fig. 1c) expresses that, with probability 0.42, 'milk' is in stock but is not ordered or bought during interval [2,4)
In temporal-probabilistic (TP) databases, the combination of the temporal and the probabilistic dimension adds significant overhead to the computation of set operations. Although set queries are guaranteed to yield linearly sized output relations, all of the existing solutions exhibit a quadratic runtime complexity. They suffer from redundant interval comparisons and additional joins for the formation of lineage expressions.
For the efficient computation of TP set operations, we introduce the lineage-aware temporal window, a mechanism that binds intervals with lineage expressions. A lineage-aware temporal window associates candidate output intervals with the lineage expressions of the valid input tuples. It has schema (F, [winTs, winTe), lambda_r, lambda_s). F is a fact included in tuples over the interval [winTs, winTe). lambda_r and lambda_s are the lineage expressions of the input tuples of the left input relation r and the right input relation s, respectively, that are valid over [winTs, winTe) and include F. The flexibility of the lineage-aware temporal window is based on the fact that the lineages of valid tuples of each input relation are separately recorded. Thus, the direct filtering of irrelevant intervals and the finalization of output lineage expressions are possible. Given a TP set operation, lambda_r and lambda_s can be used to determine if fact F and interval [winTs, winTe) yield an output tuple. If this is the case, lambda_r and lambda_s are combined accordingly to form the lineage expression of this output tuple.
The source code used for the experiments in this work consists of executor algorithms for the lineage-aware window advancer and for set operations. It can be found here.
- K. Papaioannou, M. Theobald, M. H. Böhlen
Supporting Set Operations in Temporal-Probabilistic Databases,
(to be presented in ICDE 2018)