Random input helps searching predecessors
Date
2018-06-17
Journal Title
Journal ISSN
Volume Title
Publisher
CEUR-WS.org
Abstract
A data structure problem consists of the finite sets: D of data, Q of
queries, A of query answers, associated with a function f : D ×Q → A.
The data structure of file X is “static” (“dynamic”) if we “do not”
(“do”) require quick updates as X changes. An important goal is to
compactly encode a file X ∈ D, such that for each query y ∈ Q, function
f(X, y) requires the minimum time to compute an answer in A. This
goal is trivial if the size of D is large, since for each query y ∈ Q,
it was shown that f(X, y) requires O(1) time for the most important
queries in the literature. Hence, this goal becomes interesting to study
as a trade off between the “storage space” and the “query time”, both
measured as functions of the file size n = |X|. The ideal solution would
be to use linear O(n) = O(|X|) space, while retaining a constant O(1)
query time. However, if f(X, y) computes the static predecessor search
(find largest x ∈ X : x ≤ y), then Ajtai [Ajt88] proved a negative
result. By using just n
O(1) = |X|
O(1) data space, then it is not possible
to evaluate f(X, y) in O(1) time ∀y ∈ Q. The proof exhibited a bad
distribution of data D, such that ∃y
∗ ∈ Q (a “difficult” query y
∗
), that
f(X, y∗
) requires ω(1) time. Essentially [Ajt88] is an existential result,
resolving the worst case scenario. But, [Ajt88] left open the question:
do we typically, that is, with high probability (w.h.p.) 1
encounter
such “difficult” queries y ∈ Q, when assuming reasonable distributions
with respect to (w.r.t.) queries and data? Below we make reasonable
assumptions w.r.t. the distribution of the queries y ∈ Q, as well as
w.r.t. the distribution of data X ∈ D. In two interesting scenarios
studied in the literature, we resolve the typical (w.h.p.) query time.
Description
Keywords
Predecessor search, Interpolation search, Randomly distributed queries, Smooth data distribution, Dynamic data structures