Fast Label Extraction in the CDAWG

Belazzougui, Djamal; Cunial, Fabio

Fast Label Extraction in the CDAWG

Date

2017-09-06

Authors

Belazzougui, Djamal

Cunial, Fabio

Publisher

Springer

Abstract

The compact directed acyclic word graph (CDAWG) of a string T of length n takes space proportional just to the number e of right extensions of the maximal repeats of T, and it is thus an appealing index for highly repetitive datasets, like collections of genomes from similar species, in which e grows significantly more slowly than n. We reduce from O(m log ⁡log⁡ n) to O(m) the time needed to count the number of occurrences of a pattern of length m, using an existing data structure that takes an amount of space proportional to the size of the CDAWG. This implies a reduction from O(m log log n+occ) to O(m+occ) in the time needed to locate all the occocc occurrences of the pattern. We also reduce from O(k log ⁡log ⁡n) to O(k) the time needed to read the k characters of the label of an edge of the suffix tree of T, and we reduce from O(m log ⁡log ⁡n) to O(m) the time needed to compute the matching statistics between a query of length m and T, using an existing representation of the suffix tree based on the CDAWG. All such improvements derive from extracting the label of a vertex or of an arc of the CDAWG using a straight-line program induced by the reversed CDAWG.

Keywords

CDAWG, Suffix tree, Maximal repeat, Straight-line program, Count query, Locate query, Matching statistics, Minimal absent words

URI

http://dl.cerist.dz/handle/CERIST/898

Collections

International Conference Papers

Full item page

Fast Label Extraction in the CDAWG

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By