Academic & Scientific Articles

Permanent URI for this communityhttp://dl.cerist.dz/handle/CERIST/3

Browse

Search Results

Now showing 1 - 3 of 3
  • Thumbnail Image
    Item
    Bidirectional Variable-Order de Bruijn Graphs
    (World Scientific Publishing, 2018-12) Belazzougui, Djamal; Gagie, Travis; Mäkinen, Veli; Previtali, Marco; Puglisi, Simon J.
    Compressed suffix trees and bidirectional FM-indexes can store a set of strings and support queries that let us explore the set of substrings they contain, adding and deleting characters on both the left and right, but they can use much more space than a de Bruijn graph for the strings. Bowe et al.’s BWT-based de Bruijn graph representation (Proc. Workshop on Algorithms for Bioinformatics, pp. 225–235, 2012) can be made bidirectional as well, at the cost of increasing its space usage by a small constant, but it fixes the length of the substrings. Boucher et al. (Proc. Data Compression Conference, pp. 383–392, 2015) generalized Bowe et al.’s representation to support queries about variable-length substrings, but at the cost of bidirectionality. In this paper we show how to make Boucher et al.’s variable-order implementation of de Bruijn graphs bidirectional.
  • Thumbnail Image
    Item
    A framework for space-efficient read clustering in metagenomic samples
    (BioMed Central, 2017-03-14) Alanko, Jarno; Cunial, Fabio; Belazzougui, Djamal; Mäkinen, Veli
    Background: A metagenomic sample is a set of DNA fragments, randomly extracted from multiple cells in an environment, belonging to distinct, often unknown species. Unsupervised metagenomic clustering aims at partitioning a metagenomic sample into sets that approximate taxonomic units, without using reference genomes. Since samples are large and steadily growing, space-efficient clustering algorithms are strongly needed. Results: We design and implement a space-efficient algorithmic framework that solves a number of core primitives in unsupervised metagenomic clustering using just the bidirectional Burrows-Wheeler index and a union-find data structure on the set of reads. When run on a sample of total length n, with m reads of maximum length ℓ each, on an alphabet of total size σ, our algorithms take O(n(t+logσ)) time and just 2n+o(n)+O(max{ℓ σlogn,K logm}) bits of space in addition to the index and to the union-find data structure, where K is a measure of the redundancy of the sample and t is the query time of the union-find data structure. Conclusions: Our experimental results show that our algorithms are practical, they can exploit multiple cores by a parallel traversal of the suffix-link tree, and they are competitive both in space and in time with the state of the art.
  • Thumbnail Image
    Item
    Bidirectional Variable-Order de Bruijn Graphs
    (Springer International Publishing, 2016-03-22) Belazzougui, Djamal; Gagie, Travis; Mäkinen, Veli; Previtali, Marco; Puglisi, Simon J.
    Implementing de Bruijn graphs compactly is an important problem because of their role in genome assembly. There are currently two main approaches, one using Bloom filters and the other using a kind of Burrows-Wheeler Transform on the edge labels of the graph. The second representation is more elegant and can even handle many graph-orders at once, but it does not cleanly support traversing edges backwards or inserting new nodes or edges. In this paper we resolve the first of these issues and partially address the second.