DSpace CERISTgeneral-feed.descriptionhttps://dl.cerist.dz2023-12-02T22:48:16Z2023-12-02T22:48:16Z8271Range Majorities and Minorities in ArraysBelazzougui, DjamalGagie, TravisMunro, J. IanNavarro, GonzaloNekrich, Yakovhttps://dl.cerist.dz/handle/CERIST/9982023-11-05T08:24:49Z2021-03-19T00:00:00Zdc.title: Range Majorities and Minorities in Arrays
dc.contributor.author: Belazzougui, Djamal; Gagie, Travis; Munro, J. Ian; Navarro, Gonzalo; Nekrich, Yakov
dc.description.abstract: The problem of parameterized range majority asks us to preprocess a string of length n such that, given the endpoints of a range, one can quickly find all the distinct elements whose relative frequencies in that range are more than a threshold τ. This is a more tractable version of the classical problem of finding the range mode, which is unlikely to be solvable in polylogarithmic time and linear space. In this paper we give the first linear-space solution with optimal 𝓞(1/τ ) query time, even when τ can be specified with the query. We then consider data structures whose space is bounded by the entropy of the distribution of the symbols in the sequence. For the case when the alphabet size is polynomial on the computer word size, we retain the optimal time within optimally compressed space (i.e., with sublinear redundancy). Otherwise, either the compressed space is increased by an arbitrarily small constant factor or the time rises to any function in (1/τ ) · ω(1). We obtain the same results on the complementary problem of parameterized range minority.
2021-03-19T00:00:00ZWeighted Ancestors in Suffix Trees RevisitedBelazzougui, DjamalKosolobov, DmitryPuglisi, Simon J.Raman, Rajeevhttps://dl.cerist.dz/handle/CERIST/9972023-11-02T14:54:58Z2021-06-30T00:00:00Zdc.title: Weighted Ancestors in Suffix Trees Revisited
dc.contributor.author: Belazzougui, Djamal; Kosolobov, Dmitry; Puglisi, Simon J.; Raman, Rajeev
dc.description.abstract: The weighted ancestor problem is a well-known generalization of the predecessor problem to trees. It is known to require Ω(log log n) time for queries provided 𝒪(n polylog n) space is available and weights are from [0..n], where n is the number of tree nodes. However, when applied to suffix trees, the problem, surprisingly, admits an 𝒪(n)-space solution with constant query time, as was shown by Gawrychowski, Lewenstein, and Nicholson (Proc. ESA 2014). This variant of the problem can be reformulated as follows: given the suffix tree of a string s, we need a data structure that can locate in the tree any substring s[p..q] of s in 𝒪(1) time (as if one descended from the root reading s[p..q] along the way). Unfortunately, the data structure of Gawrychowski et al. has no efficient construction algorithm, limiting its wider usage as an algorithmic tool. In this paper we resolve this issue, describing a data structure for weighted ancestors in suffix trees with constant query time and a linear construction algorithm. Our solution is based on a novel approach using so-called irreducible LCP values.
2021-06-30T00:00:00ZBlock TreesBelazzougui, DjamalCáceres, ManuelGagie, TravisGawrychowski, PawełKärkkäinen, JuhaNavarro, GonzaloOrdóñez, AlbertoPuglisi, Simon J.Tabei, Yasuohttps://dl.cerist.dz/handle/CERIST/9962023-11-02T15:03:36Z2021-05-01T00:00:00Zdc.title: Block Trees
dc.contributor.author: Belazzougui, Djamal; Cáceres, Manuel; Gagie, Travis; Gawrychowski, Paweł; Kärkkäinen, Juha; Navarro, Gonzalo; Ordóñez, Alberto; Puglisi, Simon J.; Tabei, Yasuo
dc.description.abstract: Let string S[1..n] be parsed into z phrases by the Lempel-Ziv algorithm. The corresponding compression algorithm encodes S in 𝒪(z) space, but it does not support random access to S. We introduce a data structure, the block tree, that represents S in 𝒪(z log(n/z)) space and extracts any symbol of T in time 𝒪(log(n/z)), among other space-time tradeoffs. By multiplying the space by the alphabet size, we also support rank and select queries, which are useful for building compressed data structures on top of S. Further, block trees can be built in a scalable manner. Our experiments show that block trees offer relevant space-time tradeoffs compared to other compressed string representations for highly repetitive strings.
2021-05-01T00:00:00ZDIAG a diagnostic web application based on lung CT Scan images and deep learningHadj Bouzid, Amel ImeneYahiaoui, SaïdLounis, AnisBerrani, Sid-AhmedBelbachir, HacèneNaili, QaidAbdi, Mohamed El HafedhBensalah, KawtharBelazzougui, Djamalhttps://dl.cerist.dz/handle/CERIST/9952023-10-16T16:31:41Z2021-05-29T00:00:00Zdc.title: DIAG a diagnostic web application based on lung CT Scan images and deep learning
dc.contributor.author: Hadj Bouzid, Amel Imene; Yahiaoui, Saïd; Lounis, Anis; Berrani, Sid-Ahmed; Belbachir, Hacène; Naili, Qaid; Abdi, Mohamed El Hafedh; Bensalah, Kawthar; Belazzougui, Djamal
dc.description.abstract: Coronavirus disease is a pandemic that has infected millions of people around the world. Lung CT-scans are effective diagnostic tools, but radiologists can quickly become overwhelmed by the flow of infected patients. Therefore, automated image interpretation needs to be achieved. Deep learning (DL) can support critical medical tasks including diagnostics, and DL algorithms have successfully been applied to the classification and detection of many diseases. This work aims to use deep learning methods that can classify patients between Covid-19 positive and healthy patient. We collected 4 available datasets, and tested our convolutional neural networks (CNNs) on different distributions to investigate the generalizability of our models. In order to clearly explain the predictions, Grad-CAM and Fast-CAM visualization methods were used. Our approach reaches more
than 92% accuracy on 2 different distributions. In addition, we propose a computer aided diagnosis web application for Covid-19 diagnosis. The results suggest that our proposed deep learning tool can be integrated to the Covid-19 detection process and be useful for a rapid patient management.
2021-05-29T00:00:00ZSpace-Efficient Representation of Genomic k-Mer Count TablesShibuya, YoshihiroBelazzougui, DjamalKucherov, Gregoryhttps://dl.cerist.dz/handle/CERIST/9942023-10-16T16:42:19Z2021-07-22T00:00:00Zdc.title: Space-Efficient Representation of Genomic k-Mer Count Tables
dc.contributor.author: Shibuya, Yoshihiro; Belazzougui, Djamal; Kucherov, Gregory
dc.description.abstract: k-mer counting is a common task in bioinformatic pipelines, with many dedicated tools available. Output formats could rely on quotienting to reduce the space of k-mers in hash tables, however counts are not usually stored in space-efficient formats. Overall, k-mer count tables for genomic data take a considerable space, easily reaching tens of GB. Furthermore, such tables do not support efficient random-access queries in general. In this work, we design an efficient representation of k-mer count tables supporting fast random-access queries. We propose to apply Compressed Static Functions (CSFs), with space proportional to the empirical zero-order entropy of the counts. For very skewed distributions, like those of k-mer counts in whole genomes, the only currently available implementation of CSFs does not provide a compact enough representation. By adding a Bloom Filter to a CSF we obtain a Bloom-enhanced CSF (BCSF) effectively overcoming this limitation. Furthermore, by combining BCSFs with minimizer-based bucketing of k-mers, we build even smaller representations breaking the empirical entropy lower bound, for large enough k. We also extend these representations to the approximate case, gaining additional space. We experimentally validate these techniques on k-mer count tables of whole genomes (E.Coli and C.Elegans) as well as on k-mer document frequency tables for 29 E.Coli genomes. In the case of exact counts, our representation takes about a half of the space of the empirical entropy, for large enough k’s.
2021-07-22T00:00:00ZSmaller Fully-Functional Bidirectional BWT IndexesBelazzougui,, DjamalCunial, Fabiohttps://dl.cerist.dz/handle/CERIST/9932023-10-16T16:46:57Z2020-09-17T00:00:00Zdc.title: Smaller Fully-Functional Bidirectional BWT Indexes
dc.contributor.author: Belazzougui,, Djamal; Cunial, Fabio
dc.description.abstract: Burrows-Wheeler indexes that support both extending and contracting any substring of the text $T$ of length $n$ on which they are built, in any direction, provide substantial flexibility in traversing the text and can be used to implement several algorithms. The practical appeal of such indexes is contingent on them being compact, and current designs that are sensitive to the compressibility of the input take either $O(e+\REV{e})$ words of space, where $e$ and $\REV{e}$ are the number of right and left extensions of the maximal repeats of $T$, or $O(r\log(n/r)+\REV{r}\log(n/\REV{r}))$ words, where $r$ and $\REV{r}$ are the number of runs in the Burrows-Wheeler transform of $T$ and of its reverse. In this paper we describe a fully-functional bidirectional index that takes $O(m+r+\REV{r})$ words, where $m$ is the number of maximal repeats of $T$, as well as a variant that takes $O(r+\REV{r})$ words.
2020-09-17T00:00:00ZEfficient tree-structured categorical retrievalBelazzougui, DjamalKucherov, Gregoryhttps://dl.cerist.dz/handle/CERIST/9922023-10-11T14:13:37Z2020-06-09T00:00:00Zdc.title: Efficient tree-structured categorical retrieval
dc.contributor.author: Belazzougui, Djamal; Kucherov, Gregory
dc.description.abstract: We study a document retrieval problem in the new framework where D text documents are organized in a category tree with a predefined number h of categories. This situation occurs e.g. with taxomonic trees in biology or subject classification systems for scientific literature. Given a string pattern p and a category (level in the category tree), we wish to efficiently retrieve the t categorical units containing this pattern and belonging to the category. We propose several efficient solutions for this problem. One of them uses n(log σ(1+o(1))+log D + O(h)) + O(∆) bits of space and O(|p| + t) query time, where n is the total length of the documents, σ the size of the alphabet used in the documents and ∆ is the total number of nodes in the category tree. Another solution uses n(log σ(1+o(1))+O(log D))+O(∆)+O(D log n) bits of space and O(|p| + t log D) query time. We finally propose other solutions which are more space-efficient at the expense of a slight increase in query time.
2020-06-09T00:00:00ZLinear-time string indexing and analysis in small spaceBelazzougui, DjamalCunial, FabioKärkkäinen, JuhaMäkinen, Velihttps://dl.cerist.dz/handle/CERIST/9912023-10-11T13:57:57Z2020-03-09T00:00:00Zdc.title: Linear-time string indexing and analysis in small space
dc.contributor.author: Belazzougui, Djamal; Cunial, Fabio; Kärkkäinen, Juha; Mäkinen, Veli
dc.description.abstract: The field of succinct data structures has flourished over the last 16 years. Starting from the compressed suffix array by Grossi and Vitter (STOC 2000) and the FM-index by Ferragina and Manzini (FOCS 2000), a number of generalizations and applications of string indexes based on the Burrows-Wheeler transform (BWT) have been developed, all taking an amount of space that is close to the input size in bits. In many large-scale applications, the construction of the index and its usage need to be considered as one unit of computation. For example, one can compare two genomes by building a common index for their concatenation, and by detecting common substructures by querying the index. Efficient string indexing and analysis in small space lies also at the core of a number of primitives in the data-intensive field of high-throughput DNA sequencing.
We report the following advances in string indexing and analysis. We show that the BWT of a string T ∈ {1, . . . ,σ }^n can be built in deterministic O (n) time using just O (n log σ ) bits of space, where σ ≤ n. Deterministic linear time is achieved by exploiting a new partial rank data structure that supports queries in constant time, and that might have independent interest. Within the same time and space budget, we can build an index based on the BWT that allows one to enumerate all the internal nodes of the suffix tree of T . Many fundamental string analysis problems, such as maximal repeats, maximal unique matches, and string kernels,
can be mapped to such enumeration, and can thus be solved in deterministic O (n) time and in O (n log σ ) bits of space from the input string, by tailoring the enumeration algorithm to some problem-specific computations.
We also show how to build many of the existing indexes based on the BWT, such as the compressed suffix array, the compressed suffix tree, and the bidirectional BWT index, in randomized O (n) time and in O (n log σ ) bits of space. The previously fastest construction algorithms for BWT, compressed suffix array and compressed suffix tree, which used O (n log σ ) bits of space, took O (n log log σ ) time for the first two structures, and O (n log^ϵ n) time for the third, where ϵ is any positive constant smaller than one. Alternatively, the BWT could be previously built in linear time if one was willing to spend O (n log σ log log_σ n) bits of space. Contrary to the state of the art, our bidirectional BWT index supports every operation in constant time per element in its output.
2020-03-09T00:00:00ZA critical review of quality of service models in mobile ad hoc networksBouchama, NadirAïssani, DjamilDjellab, NataliaNouali-Taboudjemat, Nadiahttps://dl.cerist.dz/handle/CERIST/9902023-10-08T08:39:22Z2019-01-01T00:00:00Zdc.title: A critical review of quality of service models in mobile ad hoc networks
dc.contributor.author: Bouchama, Nadir; Aïssani, Djamil; Djellab, Natalia; Nouali-Taboudjemat, Nadia
dc.description.abstract: Quality of service (QoS) provisioning in mobile ad hoc networks (MANETs) consists of providing a complex functionality in a harsh environment where resources are scarce. Thus, it is very challenging to build an efficient solution to address this issue. The proposed solutions in the literature are broadly classified into four categories, namely: QoS routing protocols, QoS signalling, QoS-aware MAC protocols and QoS models, which are the main concern of our study. The contribution of this paper is threefold: Firstly, we propose a set of guidelines to deal with the challenges facing QoS models design in ad hoc networks. Secondly, we propose a new taxonomy for QoS models in ad hoc networks. Finally, we provide an in-depth survey and discussion of the most relevant proposed frameworks.
2019-01-01T00:00:00ZIoT-DMCP: An IoT data management and control platform for smart citiesBoulkaboul, SaharDjenouri, DjamelBouhafs, SadmiBelaid, Mohandhttps://dl.cerist.dz/handle/CERIST/9892023-10-08T08:08:56Z2019-01-01T00:00:00Zdc.title: IoT-DMCP: An IoT data management and control platform for smart cities
dc.contributor.author: Boulkaboul, Sahar; Djenouri, Djamel; Bouhafs, Sadmi; Belaid, Mohand
dc.description.abstract: This paper presents a design and implementation of a data management platform to monitor and control smart objects in the Internet of Things (IoT). This is through IPv4/IPv6, and by combining IoT specific features and protocols such as CoAP, HTTP and WebSocket. The platform allows anomaly detection in IoT devices and real-time error reporting mechanisms. Moreover, the platform is designed as a standalone application, which targets at extending cloud connectivity to the edge of the network with fog computing. It extensively uses the features and entities provided by the Capillary Networks with a micro-services based architecture linked via a large set of REST APIs, which allows developing applications independently of the heterogeneous devices. The platform addresses the challenges in terms of connectivity, reliability, security and mobility of the Internet of Things through IPv6. The implementation of the platform is evaluated in a smart home scenario and tested via numeric results. The results show low latency, at the order of few ten of milliseconds, for building control over the implemented mobile application, which confirm realtime feature of the proposed solution.
2019-01-01T00:00:00Z