Flexible Indexing of Repetitive Collections

dc.contributor.authorBelazzougui, Djamal
dc.contributor.authorCunial, Fabio
dc.contributor.authorGagie, Travis
dc.contributor.authorPrezza, Nicola
dc.contributor.authorRaffinot, Mathieu
dc.description.abstractHighly repetitive strings are increasingly being amassed by genome sequencing experiments, and by versioned archives of source code and webpages. We describe practical data structures that support count- ing and locating all the exact occurrences of a pattern in a repetitive text, by combining the run-length encoded Burrows-Wheeler transform (RLBWT) with the boundaries of Lempel-Ziv 77 factors. One such vari- ant uses an amount of space comparable to LZ77 indexes, but it answers count queries between two and four orders of magnitude faster than all LZ77 and hybrid index implementations, at the cost of slower lo- cate queries. Combining the RLBWT with the compact directed acyclic word graph answers locate queries for short patterns between four and ten times faster than a version of the run-length compressed suffix ar- ray (RLCSA) that uses comparable memory, and with very short pat- terns our index achieves speedups even greater than ten with respect to RLCSA.fr_FR
dc.relation.ispartofseriesConference on Computability in Europe (CIE);10307
dc.relation.placeTurku, Finlande.fr_FR
dc.structureCalcul pervasif et mobile (Pervasive and Mobile Computing group)fr_FR
dc.subjectrepetitive string indexingfr_FR
dc.subjectrun-length encoded Burrows-Wheeler transformfr_FR
dc.subjectLempel-Ziv 77 compressionfr_FR
dc.titleFlexible Indexing of Repetitive Collectionsfr_FR
dc.typeConference paper