CERIST DIGITAL LIBRARY
http://www.cerist.dz:80
Le système de dépôt numérique DL collecte, emmagasine, indexe, archive, et diffuse du matériel de recherche en format numérique.2017-08-04T03:21:18ZFlexible Indexing of Repetitive Collections
http://dl.cerist.dz/handle/CERIST/896
Flexible Indexing of Repetitive Collections
Belazzougui, Djamal; Cunial, Fabio; Gagie, Travis; Prezza, Nicola; Raffinot, Mathieu
Highly repetitive strings are increasingly being amassed by
genome sequencing experiments, and by versioned archives of source code
and webpages. We describe practical data structures that support count-
ing and locating all the exact occurrences of a pattern in a repetitive
text, by combining the run-length encoded Burrows-Wheeler transform
(RLBWT) with the boundaries of Lempel-Ziv 77 factors. One such vari-
ant uses an amount of space comparable to LZ77 indexes, but it answers
count queries between two and four orders of magnitude faster than
all LZ77 and hybrid index implementations, at the cost of slower lo-
cate queries. Combining the RLBWT with the compact directed acyclic
word graph answers locate queries for short patterns between four and
ten times faster than a version of the run-length compressed suffix ar-
ray (RLCSA) that uses comparable memory, and with very short pat-
terns our index achieves speedups even greater than ten with respect to
RLCSA.
2017-06-07T00:00:00ZRepresenting the Suffix Tree with the CDAWG
http://dl.cerist.dz/handle/CERIST/895
Representing the Suffix Tree with the CDAWG
Belazzougui, Djamal; Cunial, Fabio
Given a string T, it is known that its suffix tree can be represented using the compact directed acyclic word graph (CDAWG) with e_T arcs, taking overall O(e_T+e_REV(T)) words of space, where REV(T) is the reverse of T, and supporting some key operations in time between O(1) and O(log(log(n))) in the worst case. This representation is especially appealing for highly repetitive strings, like collections of similar genomes or of version-controlled documents, in which e_T grows sublinearly in the length of T in practice. In this paper we augment such representation, supporting a number of additional queries in worst-case time between O(1) and O(log(n)) in the RAM model, without increasing space complexity asymptotically. Our technique, based on a heavy path decomposition of the suffix tree, enables also a representation of the suffix array, of the inverse suffix array, and of T itself, that takes O(e_T) words of space, and that supports random access in O(log(n)) time. Furthermore, we establish a connection between the reversed CDAWG of T and a context-free grammar that produces T and only T, which might have independent interest.
2017-07-04T00:00:00ZNew Technique to Deal With Verbose Queries in Social Book Search
http://dl.cerist.dz/handle/CERIST/894
New Technique to Deal With Verbose Queries in Social Book Search
Chaa, Messaoud; Nouali, Omar; Bellot, Patrice
Verbose query reduction and query term weighting are automatic techniques to deal with verbose queries.
2017-01-01T00:00:00ZNew GPU-based Swarm Intelligence Approach For Reducing Big Association Rules Space
http://dl.cerist.dz/handle/CERIST/893
New GPU-based Swarm Intelligence Approach For Reducing Big Association Rules Space
Djenouri, Youcef; Bendjoudi, Ahcène; Djenouri, Djamel; Belhadi, Asma; Nouali-Taboudjemat, Nadia
This paper deals with exploration and mining of association rules in big data, with the big challenge of increasing computation time. We propose a new approach based on meta-rules discovery that gives to the user the summary of the rules’ space through a meta-rules representation. This allows the user
to decide about the rules to take and prune. We also adapt a pruning strategy of our previous work to keep only the representatives rules. As the meta-rules space is much larger than the rules space, two approaches are proposed for efficient exploitation. The first one uses a bees swarm optimization method
in the meta-rules discovery process, which is extended using GPU-based parallel programming to form the second one. The sequential version has been first tested using medium rules set, and the results show clear improvement in terms of the number of returned meta-rules. The two versions have then been compared on large scale rules sets, and the results illustrate the acceleration on the summarization process by the parallel approach without reducing the quality of resulted meta-rules. Further experiments on Webdocs big data instances reveal that the proposed method of pruning rules by summarizing meta-rules considerably reduces the association rules space compared to state-of-the-art association rules mining-based approaches.
2017-06-14T00:00:00Z