A framework for space-efficient read clustering in metagenomic samples

Alanko, Jarno; Cunial, Fabio; Belazzougui, Djamal; Mäkinen, Veli

A framework for space-efficient read clustering in metagenomic samples

dc.contributor.author	Alanko, Jarno
dc.contributor.author	Cunial, Fabio
dc.contributor.author	Belazzougui, Djamal
dc.contributor.author	Mäkinen, Veli
dc.date.accessioned	2017-05-08T10:54:16Z
dc.date.available	2017-05-08T10:54:16Z
dc.date.issued	2017-03-14
dc.description.abstract	Background: A metagenomic sample is a set of DNA fragments, randomly extracted from multiple cells in an environment, belonging to distinct, often unknown species. Unsupervised metagenomic clustering aims at partitioning a metagenomic sample into sets that approximate taxonomic units, without using reference genomes. Since samples are large and steadily growing, space-efficient clustering algorithms are strongly needed. Results: We design and implement a space-efficient algorithmic framework that solves a number of core primitives in unsupervised metagenomic clustering using just the bidirectional Burrows-Wheeler index and a union-find data structure on the set of reads. When run on a sample of total length n, with m reads of maximum length ℓ each, on an alphabet of total size σ, our algorithms take O(n(t+logσ)) time and just 2n+o(n)+O(max{ℓ σlogn,K logm}) bits of space in addition to the index and to the union-find data structure, where K is a measure of the redundancy of the sample and t is the query time of the union-find data structure. Conclusions: Our experimental results show that our algorithms are practical, they can exploit multiple cores by a parallel traversal of the suffix-link tree, and they are competitive both in space and in time with the state of the art.	fr_FR
dc.identifier.issn	1471-2105	fr_FR
dc.identifier.uri	http://dl.cerist.dz/handle/CERIST/891
dc.publisher	BioMed Central	fr_FR
dc.relation.ispartofseries	BMC Bioinformatics;VOLUME 18 SUPPLEMENT 3
dc.relation.pages	49--60	fr_FR
dc.structure	Calcul pervasif et mobile (Pervasive and Mobile Computing group)	fr_FR
dc.subject	Metagenomics	fr_FR
dc.subject	Read clustering	fr_FR
dc.subject	Text indexing	fr_FR
dc.subject	Burrows-Wheeler transform	fr_FR
dc.subject	Suffix-link tree	fr_FR
dc.subject	Right-maximal k-mer	fr_FR
dc.subject	Submaximal repeat	fr_FR
dc.title	A framework for space-efficient read clustering in metagenomic samples	fr_FR
dc.type	Article

Collections

International Journal Papers

A framework for space-efficient read clustering in metagenomic samples

Files

Collections