Using Clustering and Modiﬁed Classiﬁcation algorithm without a learning corpus for automatic text summarization

Aries, Abdelkrime; Oufaida, Houda; Nouali, Omar

Using Clustering and Modiﬁed Classiﬁcation algorithm without a learning corpus for automatic text summarization

dc.contributor.author	Aries, Abdelkrime
dc.contributor.author	Oufaida, Houda
dc.contributor.author	Nouali, Omar
dc.date.accessioned	2015-05-13T16:02:12Z
dc.date.available	2015-05-13T16:02:12Z
dc.date.issued	2013-02-05
dc.description.abstract	In this paper we describe a modiﬁed classiﬁcation method destined for extractive summarization purpose. The classiﬁcation in this method doesn’t need a learning corpus; it uses the input text to do that. First, we cluster the document sentences to exploit the diversity of topics, then we use a learning algorithm (here we used Naive Bayes) on each cluster considering it as a class. After obtaining the classiﬁcation model, we calculate the score of a sentence in each class, using a scoring model derived from classiﬁcation algorithm. These scores are used, then, to reorder the sentences and extract the ﬁrst ones as the output summary. We conducted some experiments using a corpus of scientiﬁc papers, and comparing our system to another system which is UNIS system. Also, we experiment the impact of clustering threshold tuning, on the resulted summary, as well as the impact of adding more features to the classiﬁer. We found that this method is interesting, and gives good performance, and the addition of new features (which is simple using this method) can improve summary’s accuracy.	fr_FR
dc.identifier.uri	http://dl.cerist.dz/handle/CERIST/747
dc.relation.ispartof	The 20th international conference on document recognation and retrieval DRR	fr_FR
dc.relation.place	San Fransisco California USA	fr_FR
dc.structure	Recherche d'Information	fr_FR
dc.subject	NLP	fr_FR
dc.subject	IR	fr_FR
dc.subject	Automatic text summarization	fr_FR
dc.subject	Clustering	fr_FR
dc.title	Using Clustering and Modiﬁed Classiﬁcation algorithm without a learning corpus for automatic text summarization	fr_FR
dc.type	Conference paper

Collections

International Conference Papers

Using Clustering and Modiﬁed Classiﬁcation algorithm without a learning corpus for automatic text summarization

Files

Collections