International Conference Papers

Permanent URI for this collectionhttp://dl.cerist.dz/handle/CERIST/4

Browse

Search Results

Now showing 1 - 4 of 4
  • Thumbnail Image
    Item
    Weighted Ancestors in Suffix Trees Revisited
    (Schloss Dagstuhl - Leibniz-Zentrum für Informatik 2021, 2021-06-30) Belazzougui, Djamal; Kosolobov, Dmitry; Puglisi, Simon J.; Raman, Rajeev
    The weighted ancestor problem is a well-known generalization of the predecessor problem to trees. It is known to require Ω(log log n) time for queries provided 𝒪(n polylog n) space is available and weights are from [0..n], where n is the number of tree nodes. However, when applied to suffix trees, the problem, surprisingly, admits an 𝒪(n)-space solution with constant query time, as was shown by Gawrychowski, Lewenstein, and Nicholson (Proc. ESA 2014). This variant of the problem can be reformulated as follows: given the suffix tree of a string s, we need a data structure that can locate in the tree any substring s[p..q] of s in 𝒪(1) time (as if one descended from the root reading s[p..q] along the way). Unfortunately, the data structure of Gawrychowski et al. has no efficient construction algorithm, limiting its wider usage as an algorithmic tool. In this paper we resolve this issue, describing a data structure for weighted ancestors in suffix trees with constant query time and a linear construction algorithm. Our solution is based on a novel approach using so-called irreducible LCP values.
  • Thumbnail Image
    Item
    Computing the Antiperiod(s) of a String
    (Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik, 2019-06-18) Alamro, Hayam; Badkobeh, Golnaz; Belazzougui, Djamal; Iliopoulos, Costas S.; Puglisi, Simon J.
    A string S[1, n] is a power (or repetition or tandem repeat) of order k and period n/k, if it can be decomposed into k consecutive identical blocks of length n/k. Powers and periods are fundamental structures in the study of strings and algorithms to compute them efficiently have been widely studied. Recently, Fici et al. (Proc. ICALP 2016) introduced an antipower of order k to be a string composed of k distinct blocks of the same length, n/k, called the antiperiod. An arbitrary string will have antiperiod t if it is prefix of an antipower with antiperiod t. In this paper, we describe efficient algorithm for computing the smallest antiperiod of a string S of length n in O(n) time. We also describe an algorithm to compute all the antiperiods of S that runs in O(n log n) time.
  • Thumbnail Image
    Item
    Bidirectional Variable-Order de Bruijn Graphs
    (Springer International Publishing, 2016-03-22) Belazzougui, Djamal; Gagie, Travis; Mäkinen, Veli; Previtali, Marco; Puglisi, Simon J.
    Implementing de Bruijn graphs compactly is an important problem because of their role in genome assembly. There are currently two main approaches, one using Bloom filters and the other using a kind of Burrows-Wheeler Transform on the edge labels of the graph. The second representation is more elegant and can even handle many graph-orders at once, but it does not cleanly support traversing edges backwards or inserting new nodes or edges. In this paper we resolve the first of these issues and partially address the second.
  • Thumbnail Image
    Item
    Lempel-Ziv Decoding in External Memory
    (Springer International Publishing, 2016-06-01) Belazzougui, Djamal; Kärkkäinen, Juha; Kempa, Dominik; Puglisi, Simon J.
    Simple and fast decoding is one of the main advantages of LZ77-type text encoding used in many popular file compressors such as gzip and 7zip. With the recent introduction of external memory algorithms for Lempel–Ziv factorization there is a need for external memory LZ77 decoding but the standard algorithm makes random accesses to the text and cannot be trivially modified for external memory computation. We describe the first external memory algorithms for LZ77 decoding, prove that their I/O complexity is optimal, and demonstrate that they are very fast in practice, only about three times slower than in-memory decoding (when reading input and writing output is included in the time).