Efficient tree-structured categorical retrieval
dc.contributor.author | Belazzougui, Djamal | |
dc.contributor.author | Kucherov, Gregory | |
dc.date.accessioned | 2023-10-11T14:06:15Z | |
dc.date.available | 2023-10-11T14:06:15Z | |
dc.date.issued | 2020-06-09 | |
dc.description.abstract | We study a document retrieval problem in the new framework where D text documents are organized in a category tree with a predefined number h of categories. This situation occurs e.g. with taxomonic trees in biology or subject classification systems for scientific literature. Given a string pattern p and a category (level in the category tree), we wish to efficiently retrieve the t categorical units containing this pattern and belonging to the category. We propose several efficient solutions for this problem. One of them uses n(log σ(1+o(1))+log D + O(h)) + O(∆) bits of space and O(|p| + t) query time, where n is the total length of the documents, σ the size of the alphabet used in the documents and ∆ is the total number of nodes in the category tree. Another solution uses n(log σ(1+o(1))+O(log D))+O(∆)+O(D log n) bits of space and O(|p| + t log D) query time. We finally propose other solutions which are more space-efficient at the expense of a slight increase in query time. | |
dc.identifier.isbn | 978-3-95977-149-8 | |
dc.identifier.issn | 1868-8969 | |
dc.identifier.uri | https://dl.cerist.dz/handle/CERIST/992 | |
dc.publisher | Leibniz International Proceedings in Informatics (LIPIcs) | |
dc.relation.ispartofseries | 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020) | |
dc.relation.pages | 1-11 | |
dc.relation.place | Copenhagen, Denmark | |
dc.structure | Calcul pervasif et mobile (Pervasive and Mobile Computing group) | |
dc.subject | Pattern matching | |
dc.subject | Document retrieval | |
dc.subject | Category tree | |
dc.subject | Space- efficient data structures | |
dc.title | Efficient tree-structured categorical retrieval | |
dc.type | Conference paper |