Efficient tree-structured categorical retrieval

Belazzougui, Djamal; Kucherov, Gregory

Efficient tree-structured categorical retrieval

Date

2020-06-09

Authors

Belazzougui, Djamal

Kucherov, Gregory

Publisher

Leibniz International Proceedings in Informatics (LIPIcs)

Abstract

We study a document retrieval problem in the new framework where D text documents are organized in a category tree with a predefined number h of categories. This situation occurs e.g. with taxomonic trees in biology or subject classification systems for scientific literature. Given a string pattern p and a category (level in the category tree), we wish to efficiently retrieve the t categorical units containing this pattern and belonging to the category. We propose several efficient solutions for this problem. One of them uses n(log σ(1+o(1))+log D + O(h)) + O(∆) bits of space and O(|p| + t) query time, where n is the total length of the documents, σ the size of the alphabet used in the documents and ∆ is the total number of nodes in the category tree. Another solution uses n(log σ(1+o(1))+O(log D))+O(∆)+O(D log n) bits of space and O(|p| + t log D) query time. We finally propose other solutions which are more space-efficient at the expense of a slight increase in query time.

Keywords

Pattern matching, Document retrieval, Category tree, Space- efficient data structures

URI

https://dl.cerist.dz/handle/CERIST/992

Collections

International Conference Papers

Full item page

Efficient tree-structured categorical retrieval

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By