Random input helps searching predecessors

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
A data structure problem consists of the finite sets: D of data, Q of queries, A of query answers, associated with a function f : D ×Q → A. The data structure of file X is “static” (“dynamic”) if we “do not” (“do”) require quick updates as X changes. An important goal is to compactly encode a file X ∈ D, such that for each query y ∈ Q, function f(X, y) requires the minimum time to compute an answer in A. This goal is trivial if the size of D is large, since for each query y ∈ Q, it was shown that f(X, y) requires O(1) time for the most important queries in the literature. Hence, this goal becomes interesting to study as a trade off between the “storage space” and the “query time”, both measured as functions of the file size n = |X|. The ideal solution would be to use linear O(n) = O(|X|) space, while retaining a constant O(1) query time. However, if f(X, y) computes the static predecessor search (find largest x ∈ X : x ≤ y), then Ajtai [Ajt88] proved a negative result. By using just n O(1) = |X| O(1) data space, then it is not possible to evaluate f(X, y) in O(1) time ∀y ∈ Q. The proof exhibited a bad distribution of data D, such that ∃y ∗ ∈ Q (a “difficult” query y ∗ ), that f(X, y∗ ) requires ω(1) time. Essentially [Ajt88] is an existential result, resolving the worst case scenario. But, [Ajt88] left open the question: do we typically, that is, with high probability (w.h.p.) 1 encounter such “difficult” queries y ∈ Q, when assuming reasonable distributions with respect to (w.r.t.) queries and data? Below we make reasonable assumptions w.r.t. the distribution of the queries y ∈ Q, as well as w.r.t. the distribution of data X ∈ D. In two interesting scenarios studied in the literature, we resolve the typical (w.h.p.) query time.
Predecessor search, Interpolation search, Randomly distributed queries, Smooth data distribution, Dynamic data structures