Browsing by Author "Gagie, Travis"
Now showing 1 - 6 of 6
Results Per Page
Sort Options
- ItemBidirectional Variable-Order de Bruijn Graphs(Springer International Publishing, 2016-03-22) Belazzougui, Djamal; Gagie, Travis; Mäkinen, Veli; Previtali, Marco; Puglisi, Simon J.Implementing de Bruijn graphs compactly is an important problem because of their role in genome assembly. There are currently two main approaches, one using Bloom filters and the other using a kind of Burrows-Wheeler Transform on the edge labels of the graph. The second representation is more elegant and can even handle many graph-orders at once, but it does not cleanly support traversing edges backwards or inserting new nodes or edges. In this paper we resolve the first of these issues and partially address the second.
- ItemBidirectional Variable-Order de Bruijn Graphs(World Scientific Publishing, 2018-12) Belazzougui, Djamal; Gagie, Travis; Mäkinen, Veli; Previtali, Marco; Puglisi, Simon J.Compressed suffix trees and bidirectional FM-indexes can store a set of strings and support queries that let us explore the set of substrings they contain, adding and deleting characters on both the left and right, but they can use much more space than a de Bruijn graph for the strings. Bowe et al.’s BWT-based de Bruijn graph representation (Proc. Workshop on Algorithms for Bioinformatics, pp. 225–235, 2012) can be made bidirectional as well, at the cost of increasing its space usage by a small constant, but it fixes the length of the substrings. Boucher et al. (Proc. Data Compression Conference, pp. 383–392, 2015) generalized Bowe et al.’s representation to support queries about variable-length substrings, but at the cost of bidirectionality. In this paper we show how to make Boucher et al.’s variable-order implementation of de Bruijn graphs bidirectional.
- ItemBlock Trees(ELSEVIER, 2021-05) Belazzougui, Djamal; Cáceres, Manuel; Gagie, Travis; Gawrychowski, Paweł; Kärkkäinen, Juha; Navarro, Gonzalo; Ordóñez, Alberto; Puglisi, Simon J.; Tabei, YasuoLet string S[1..n] be parsed into z phrases by the Lempel-Ziv algorithm. The corresponding compression algorithm encodes S in 𝒪(z) space, but it does not support random access to S. We introduce a data structure, the block tree, that represents S in 𝒪(z log(n/z)) space and extracts any symbol of T in time 𝒪(log(n/z)), among other space-time tradeoffs. By multiplying the space by the alphabet size, we also support rank and select queries, which are useful for building compressed data structures on top of S. Further, block trees can be built in a scalable manner. Our experiments show that block trees offer relevant space-time tradeoffs compared to other compressed string representations for highly repetitive strings.
- ItemFlexible Indexing of Repetitive Collections(Springer, 2017-06-07) Belazzougui, Djamal; Cunial, Fabio; Gagie, Travis; Prezza, Nicola; Raffinot, MathieuHighly repetitive strings are increasingly being amassed by genome sequencing experiments, and by versioned archives of source code and webpages. We describe practical data structures that support count- ing and locating all the exact occurrences of a pattern in a repetitive text, by combining the run-length encoded Burrows-Wheeler transform (RLBWT) with the boundaries of Lempel-Ziv 77 factors. One such vari- ant uses an amount of space comparable to LZ77 indexes, but it answers count queries between two and four orders of magnitude faster than all LZ77 and hybrid index implementations, at the cost of slower lo- cate queries. Combining the RLBWT with the compact directed acyclic word graph answers locate queries for short patterns between four and ten times faster than a version of the run-length compressed suffix ar- ray (RLCSA) that uses comparable memory, and with very short pat- terns our index achieves speedups even greater than ten with respect to RLCSA.
- ItemFully Dynamic de Bruijn Graphs(Springer International Publishing, 2016-09-21) Belazzougui, Djamal; Gagie, Travis; Mäkinen, Veli; Previtali, MarcoWe present a space- and time-efficient fully dynamic implementation of de Bruijn graphs, which can also support fixed-length jumbled pattern matching.
- ItemRange Majorities and Minorities in Arrays(Springer Nature, 2021-03-19) Belazzougui, Djamal; Gagie, Travis; Munro, J. Ian; Navarro, Gonzalo; Nekrich, YakovThe problem of parameterized range majority asks us to preprocess a string of length n such that, given the endpoints of a range, one can quickly find all the distinct elements whose relative frequencies in that range are more than a threshold τ. This is a more tractable version of the classical problem of finding the range mode, which is unlikely to be solvable in polylogarithmic time and linear space. In this paper we give the first linear-space solution with optimal 𝓞(1/τ ) query time, even when τ can be specified with the query. We then consider data structures whose space is bounded by the entropy of the distribution of the symbols in the sequence. For the case when the alphabet size is polynomial on the computer word size, we retain the optimal time within optimally compressed space (i.e., with sublinear redundancy). Otherwise, either the compressed space is increased by an arbitrarily small constant factor or the time rises to any function in (1/τ ) · ω(1). We obtain the same results on the complementary problem of parameterized range minority.