Fast matching statistics in small space

dc.contributor.authorBelazzougui, Djamal
dc.contributor.authorCunial, Fabio
dc.contributor.authorDenas, Olgert
dc.description.abstractComputing the matching statistics of a string S with respect to a string T on an alphabet of size sigma is a fundamental primitive for a number of large-scale string analysis applications, including the comparison of entire genomes, for which space is a pressing issue. This paper takes from theory to practice an existing algorithm that uses just O(|T|log{sigma}) bits of space, and that computes a compact encoding of the matching statistics array in O(|S|log{sigma}) time. The techniques used to speed up the algorithm are of general interest, since they optimize queries on the existence of a Weiner link from a node of the suffix tree, and parent operations after unsuccessful Weiner links. Thus, they can be applied to other matching statistics algorithms, as well as to any suffix tree traversal that relies on such calls. Some of our optimizations yield a matching statistics implementation that is up to three times faster than a plain version of the algorithm, depending on the similarity between S and T. In genomic datasets of practical significance we achieve speedups of up to 1.8, but our fastest implementations take on average twice the time of an existing code based on the LCP array. The key advantage is that our implementations need between one half and one fifth of the competitor's memory, and they approach comparable running times when S and T are very similar.fr_FR
dc.publisherSchloss Dagstuhl--Leibniz-Zentrum fuer Informatikfr_FR
dc.relation.ispartofseriesLeibniz International Proceedings in Informatics (LIPIcs);103
dc.structureCalcul pervasif et mobile (Pervasive and Mobile Computing group)fr_FR
dc.subjectMatching statisticsfr_FR
dc.subjectmaximal repeatfr_FR
dc.subjectBurrows-Wheeler transformfr_FR
dc.subjectwavelet treefr_FR
dc.subjectsuffix tree topologyfr_FR
dc.titleFast matching statistics in small spacefr_FR
dc.typeConference paper