Fast Label Extraction in the CDAWG

dc.contributor.authorBelazzougui, Djamal
dc.contributor.authorCunial, Fabio
dc.date.accessioned2017-09-23T11:43:31Z
dc.date.available2017-09-23T11:43:31Z
dc.date.issued2017-09-06
dc.description.abstractThe compact directed acyclic word graph (CDAWG) of a string T of length n takes space proportional just to the number e of right extensions of the maximal repeats of T, and it is thus an appealing index for highly repetitive datasets, like collections of genomes from similar species, in which e grows significantly more slowly than n. We reduce from O(m log ⁡log⁡ n) to O(m) the time needed to count the number of occurrences of a pattern of length m, using an existing data structure that takes an amount of space proportional to the size of the CDAWG. This implies a reduction from O(m log log n+occ) to O(m+occ) in the time needed to locate all the occocc occurrences of the pattern. We also reduce from O(k log ⁡log ⁡n) to O(k) the time needed to read the k characters of the label of an edge of the suffix tree of T, and we reduce from O(m log ⁡log ⁡n) to O(m) the time needed to compute the matching statistics between a query of length m and T, using an existing representation of the suffix tree based on the CDAWG. All such improvements derive from extracting the label of a vertex or of an arc of the CDAWG using a straight-line program induced by the reversed CDAWG.fr_FR
dc.identifier.isbn978-3-319-67427-8fr_FR
dc.identifier.isbn978-3-319-67428-5fr_FR
dc.identifier.urihttp://dl.cerist.dz/handle/CERIST/898
dc.publisherSpringerfr_FR
dc.relation.ispartofseriesInternational Symposium on String Processing and Information Retrieval (SPIRE);10508
dc.relation.pages161-175fr_FR
dc.relation.placePalerme, Italiefr_FR
dc.structureCalcul pervasif et mobile (Pervasive and Mobile Computing group)fr_FR
dc.subjectCDAWGfr_FR
dc.subjectSuffix treefr_FR
dc.subjectMaximal repeatfr_FR
dc.subjectStraight-line programfr_FR
dc.subjectCount queryfr_FR
dc.subjectLocate queryfr_FR
dc.subjectMatching statisticsfr_FR
dc.subjectMinimal absent wordsfr_FR
dc.titleFast Label Extraction in the CDAWGfr_FR
dc.typeConference paper
Files