Amharic Text Summarization for News Items Posted on Social Media

Abaynew Guadie; Debela Tesfaye; Teferi Kebebew

doi:doi:10.11648/j.ijiis.20211006.14

| Peer-Reviewed

Amharic Text Summarization for News Items Posted on Social Media

Abaynew Guadie, Debela Tesfaye, Teferi Kebebew

Published in International Journal of Intelligent Information Systems (Volume 10, Issue 6)

Received: 25 September 2021 Accepted: 19 October 2021 Published: 24 December 2021

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

This paper introduces Amharic Text Summarization for News Items posted on social media, to summarize the news items posted Amharic texts over a time posted documents from social media on Twitter and Facebook; The main problems of the social media posted texts are that most people would probably read their posted in Amharic texts with duplicate posted documents. However, to find the information the user is looking for she or he will have to find summary posted texts and read important portions of posts as Amharic documents to extract desired information on social media. Summarization is dealing with information overload presenting and posted with a text document for the current time representation of the posted documents to summarize. Our proposed approach has three main components: First, calculate the similarity between each posted document within the two pair of sentences. Second, clustering based on the similarity results of the documents to group them by using Kmeans algorithm. Third, summarizing the clustered posted document individually using TF-IDF algorithms that involve finding statistical ways for the frequent terms to rank the documents. We applied the summarization technique is an extractive summarization approach that is assigned an extract the sentences with highest ranked sentences in the posted documents to form the summaries and the size of the summary can be identified by the user. In the experiment one the highest F-measure score is 87.07% for extraction rate at 30%, in the clustered group of protests posts. The second experiment the highest F-measure score is 84% for extraction rate at 30%, in droughts post groups. In the third experiment the highest F-measure score is 91.37% for extraction rate at 30%, in the sports post groups and also the fourth experiments the highest F-measure score is 93.52% for extraction rate at 30% to generate the summary post texts. If the system to generate the size of summary is increased, the extraction rate also increased in posted texts. For this the evaluation system shown that a very good results to summaries the posted texts on social media.

Published in	International Journal of Intelligent Information Systems (Volume 10, Issue 6)
DOI	10.11648/j.ijiis.20211006.14
Page(s)	125-138
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2021. Published by Science Publishing Group

Keywords

Amharic, Similarity, Text Summarization, Clustering, TF-IDF Algorithm, Social Media, Facebook, Twitter

References

[1]	Q. Guo, F. Diaz, and E. Yom-Tov, “Updating users about time critical events,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7814 LNCS, pp. 483–494, 2013.
[2]	E. N. Agency, “Amharic Text classfication,” 2000.
[3]	T. Xu, P. Mcnamee, and D. W. Oard, “HLTCOE at TREC 2013 : Temporal Summarization,” 2013.
[4]	D. Chakrabarti and K. Punera, “Event Summarization Using Tweets,” pp. 66–73, 2011.
[5]	F. Chong and T. Chua, “Automatic Summarization of Events from Social Media,” 2009.
[6]	H. Sayyadi, M. Hurst, A. Maykov, and M. Livelabs, “Event Detection and Tracking in Social Streams*,” pp. 311–314, 2009.
[7]	T. Sakaki, M. Okazaki, and Y. Matsuo, “Earthquake shakes Twitter users: real-time event detection by social sensors,” WWW ’10: Proceedings of the 19th international conference on World wide web, p. 851, 2010.
[8]	H. L. Eidheim, “Temporal Summarization of Time Critical Events,” no. June, 2015.
[9]	P. W. Mcburney and C. Mcmillan, “Automatic Source Code Summarization of Context for Java Methods,” vol. 5589, no. c, pp. 1–18, 2015.
[10]	K. R. Premlatha and T. V. Geetha,“Extracting Temporal Patterns and Analyzing Peak Events,” Thesis, Department of computer science, Anna University, Vol 2, July-Dec 2010.
[11]	J. Makkonen and H. Ahonen-myka, “Utilizing Temporal Information in Topic Detection and Tracking,” 2005.
[12]	T. Melese, “Automatic Amharic Text Summarization using Latent Semantic Analysis,” Thesis, Department of Computer Science, Addis Ababa University, 2009.
[13]	A. Addis, “Automatic Summarization for Amharic Text Using Open Text Summarizer”. Thesis, Department of Information Science, Addis Ababa University, 2013.
[14]	D. Eyob, “Topic-based Amharic Text Summarization Topic-based Amharic Text Summarization,” no. March, 2011.
[15]	K. D. Dessalegn and M. Y. Tachbelie, “Graph-based Automatic Amharic Text Summarizer,” vol. 8, February-2017.
[16]	H. Saggion, D. Radev, S. Teufel, and S. M. Strassel, “Developing Infrastructure for the Evaluation of Single and Multi-document Summarization Systems in a Cross-lingual Environment,” no. 2000, 2001.
[17]	D. R. Radev and A. Arbor, “Centroid-based summarization of multiple documents : sentence extraction, utility-based evaluation, and user studies,” 1989.
[18]	R. Barzilay and K. R. Mckeown, “Inferring Strategies for Sentence Ordering in Multidocument News Summarization,” vol. 17, pp. 35–55, 2002.
[19]	“COMPARATIVE EVALUATION OF MODULAR AUTOMATIC SUMMARISATION SYSTEMS USING CAST Constantin Or ˘,” 2006.
[20]	D. Marcu and M. Rey, “Discourse-Based Summarization in DUC-2001 Summarizing document,” 2001.
[21]	D. Inouye and J. K. Kalita, “Comparing Twitter Summarization Algorithms for Multiple Post Summaries,” 2010.
[22]	Z. He, C. Chen, J. Bu, C. Wang, and L. Zhang, “Document Summarization Based on Data Reconstruction,” pp. 620–626, 2011.
[23]	P. Paroubek, S. Chaudiron, L. Hirschman, L. Cnrs, B. Université, and P. Xi, “Principles of Evaluation in Natural Language Processing,” vol. 48, pp. 7–31, 2007.
[24]	K. Tao, F. Abel, Q. Gao, and G. Houben, “TUMS : Twitter-based User Modeling Service,” pp. 1–15, 2010.
[25]	W. Chung, H. Chen, L. G. Chaboya, C. D. O. Toole, and H. Atabakhsh, “Evaluating event visualization : a usability study of COPLINK spatio-temporal visualizer,” vol. 62, pp. 127–157, 2005.
[26]	M. Busch, K. Gade, B. Larson, P. Lok, S. Luckenbill, and J. Lin, “Earlybird : Real-Time Search at Twitter,” 2011.
[27]	M. Potthast, B. Stein, F. Loose, and S. Becker, “Information Retrieval in the Commentsphere,” vol. V, no. January, 2012.
[28]	I. Novalija, M. Papler, and D. Mladeni, “TOWARDS SOCIAL MEDIA MINING : TWITTEROBSERVATORY,” pp. 2–5.
[29]	B. Sharifi, M. Hutton, and J. K. Kalita, “Experiments in Microblog Summarization,” 2004.
[30]	P. Meladianos and I. R. C. Athena, “Degeneracy-based Real-Time Sub-Event Detection in Twitter Stream,” 2015.
[31]	F. Ibekwe-sanjuan, S. Fernandez, E. Sanjuan, and E. Charton, “Annotation of Scientific Summaries for Information Retrieval,” pp. 1–14, 2002.
[32]	K. Bontcheva, L. Derczynski, A. Funk, M. A. Greenwood, D. Maynard, and N. Aswani, “TwitIE : An Open-Source Information Extraction Pipeline for Microblog Text,” 2013.
[33]	B. Truong, C. Caragea, A. Squicciarini, and A. H. Tapia, “Identifying Valuable Information from Twitter During Natural Disasters,” 2014.
[34]	Y. Seki, “Sentence Extraction by tf / idf and Position Weighting from Newspaper Articles,” 2003.
[35]	N. K. Nagwani, “A Frequent Term and Semantic Similarity based Single Document Text Summarization Algorithm,” vol. 17, no. 2, pp. 36–40, 2011.
[36]	H. Saggion, T. Poibeau, H. Saggion, T. Poibeau, A. Text, and S. Past, “Automatic Text Summarization : Past, Present and Future To cite this version : Automatic Text Summarization : Past, Present and Future,” 2016.
[37]	J. L. Neto, A. A. Freitas, and C. A. A. Kaestner, “Automatic Text Summarization using a Machine Learning Approach,” 2003.
[38]	S. Eyassu, “Classifying Amharic News Text Using Self-Organizing Maps,” 2000.
[39]	T. N. Dao and T. Simpson, “Measuring Similarity between sentences,” 1983.

Cite This Article

Plain Text BibTeX RIS

APA Style

Abaynew Guadie, Debela Tesfaye, Teferi Kebebew. (2021). Amharic Text Summarization for News Items Posted on Social Media. International Journal of Intelligent Information Systems, 10(6), 125-138. https://doi.org/10.11648/j.ijiis.20211006.14

Copy | Download

ACS Style

Abaynew Guadie; Debela Tesfaye; Teferi Kebebew. Amharic Text Summarization for News Items Posted on Social Media. Int. J. Intell. Inf. Syst. 2021, 10(6), 125-138. doi: 10.11648/j.ijiis.20211006.14

Copy | Download

AMA Style

Abaynew Guadie, Debela Tesfaye, Teferi Kebebew. Amharic Text Summarization for News Items Posted on Social Media. Int J Intell Inf Syst. 2021;10(6):125-138. doi: 10.11648/j.ijiis.20211006.14

Copy | Download

@article{10.11648/j.ijiis.20211006.14,
  author = {Abaynew Guadie and Debela Tesfaye and Teferi Kebebew},
  title = {Amharic Text Summarization for News Items Posted on Social Media},
  journal = {International Journal of Intelligent Information Systems},
  volume = {10},
  number = {6},
  pages = {125-138},
  doi = {10.11648/j.ijiis.20211006.14},
  url = {https://doi.org/10.11648/j.ijiis.20211006.14},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijiis.20211006.14},
  abstract = {This paper introduces Amharic Text Summarization for News Items posted on social media, to summarize the news items posted Amharic texts over a time posted documents from social media on Twitter and Facebook; The main problems of the social media posted texts are that most people would probably read their posted in Amharic texts with duplicate posted documents. However, to find the information the user is looking for she or he will have to find summary posted texts and read important portions of posts as Amharic documents to extract desired information on social media. Summarization is dealing with information overload presenting and posted with a text document for the current time representation of the posted documents to summarize. Our proposed approach has three main components: First, calculate the similarity between each posted document within the two pair of sentences. Second, clustering based on the similarity results of the documents to group them by using Kmeans algorithm. Third, summarizing the clustered posted document individually using TF-IDF algorithms that involve finding statistical ways for the frequent terms to rank the documents. We applied the summarization technique is an extractive summarization approach that is assigned an extract the sentences with highest ranked sentences in the posted documents to form the summaries and the size of the summary can be identified by the user. In the experiment one the highest F-measure score is 87.07% for extraction rate at 30%, in the clustered group of protests posts. The second experiment the highest F-measure score is 84% for extraction rate at 30%, in droughts post groups. In the third experiment the highest F-measure score is 91.37% for extraction rate at 30%, in the sports post groups and also the fourth experiments the highest F-measure score is 93.52% for extraction rate at 30% to generate the summary post texts. If the system to generate the size of summary is increased, the extraction rate also increased in posted texts. For this the evaluation system shown that a very good results to summaries the posted texts on social media.},
 year = {2021}
}

Copy | Download

TY  - JOUR
T1  - Amharic Text Summarization for News Items Posted on Social Media
AU  - Abaynew Guadie
AU  - Debela Tesfaye
AU  - Teferi Kebebew
Y1  - 2021/12/24
PY  - 2021
N1  - https://doi.org/10.11648/j.ijiis.20211006.14
DO  - 10.11648/j.ijiis.20211006.14
T2  - International Journal of Intelligent Information Systems
JF  - International Journal of Intelligent Information Systems
JO  - International Journal of Intelligent Information Systems
SP  - 125
EP  - 138
PB  - Science Publishing Group
SN  - 2328-7683
UR  - https://doi.org/10.11648/j.ijiis.20211006.14
AB  - This paper introduces Amharic Text Summarization for News Items posted on social media, to summarize the news items posted Amharic texts over a time posted documents from social media on Twitter and Facebook; The main problems of the social media posted texts are that most people would probably read their posted in Amharic texts with duplicate posted documents. However, to find the information the user is looking for she or he will have to find summary posted texts and read important portions of posts as Amharic documents to extract desired information on social media. Summarization is dealing with information overload presenting and posted with a text document for the current time representation of the posted documents to summarize. Our proposed approach has three main components: First, calculate the similarity between each posted document within the two pair of sentences. Second, clustering based on the similarity results of the documents to group them by using Kmeans algorithm. Third, summarizing the clustered posted document individually using TF-IDF algorithms that involve finding statistical ways for the frequent terms to rank the documents. We applied the summarization technique is an extractive summarization approach that is assigned an extract the sentences with highest ranked sentences in the posted documents to form the summaries and the size of the summary can be identified by the user. In the experiment one the highest F-measure score is 87.07% for extraction rate at 30%, in the clustered group of protests posts. The second experiment the highest F-measure score is 84% for extraction rate at 30%, in droughts post groups. In the third experiment the highest F-measure score is 91.37% for extraction rate at 30%, in the sports post groups and also the fourth experiments the highest F-measure score is 93.52% for extraction rate at 30% to generate the summary post texts. If the system to generate the size of summary is increased, the extraction rate also increased in posted texts. For this the evaluation system shown that a very good results to summaries the posted texts on social media.
VL  - 10
IS  - 6
ER  -

Copy | Download

Author Information

Abaynew Guadie

Faculty of Computing, Jimma Institute of Technology, Jimma University, Jimma, Ethiopia
Debela Tesfaye

Faculty of Computing, Jimma Institute of Technology, Jimma University, Jimma, Ethiopia
Teferi Kebebew

Faculty of Computing, Jimma Institute of Technology, Jimma University, Jimma, Ethiopia

Download PDF

Submit an Article

Sections

Plain Text BibTeX RIS

APA Style

Abaynew Guadie, Debela Tesfaye, Teferi Kebebew. (2021). Amharic Text Summarization for News Items Posted on Social Media. International Journal of Intelligent Information Systems, 10(6), 125-138. https://doi.org/10.11648/j.ijiis.20211006.14

Copy | Download

ACS Style

Abaynew Guadie; Debela Tesfaye; Teferi Kebebew. Amharic Text Summarization for News Items Posted on Social Media. Int. J. Intell. Inf. Syst. 2021, 10(6), 125-138. doi: 10.11648/j.ijiis.20211006.14

Copy | Download

AMA Style

Abaynew Guadie, Debela Tesfaye, Teferi Kebebew. Amharic Text Summarization for News Items Posted on Social Media. Int J Intell Inf Syst. 2021;10(6):125-138. doi: 10.11648/j.ijiis.20211006.14

Copy | Download

@article{10.11648/j.ijiis.20211006.14,
  author = {Abaynew Guadie and Debela Tesfaye and Teferi Kebebew},
  title = {Amharic Text Summarization for News Items Posted on Social Media},
  journal = {International Journal of Intelligent Information Systems},
  volume = {10},
  number = {6},
  pages = {125-138},
  doi = {10.11648/j.ijiis.20211006.14},
  url = {https://doi.org/10.11648/j.ijiis.20211006.14},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijiis.20211006.14},
  abstract = {This paper introduces Amharic Text Summarization for News Items posted on social media, to summarize the news items posted Amharic texts over a time posted documents from social media on Twitter and Facebook; The main problems of the social media posted texts are that most people would probably read their posted in Amharic texts with duplicate posted documents. However, to find the information the user is looking for she or he will have to find summary posted texts and read important portions of posts as Amharic documents to extract desired information on social media. Summarization is dealing with information overload presenting and posted with a text document for the current time representation of the posted documents to summarize. Our proposed approach has three main components: First, calculate the similarity between each posted document within the two pair of sentences. Second, clustering based on the similarity results of the documents to group them by using Kmeans algorithm. Third, summarizing the clustered posted document individually using TF-IDF algorithms that involve finding statistical ways for the frequent terms to rank the documents. We applied the summarization technique is an extractive summarization approach that is assigned an extract the sentences with highest ranked sentences in the posted documents to form the summaries and the size of the summary can be identified by the user. In the experiment one the highest F-measure score is 87.07% for extraction rate at 30%, in the clustered group of protests posts. The second experiment the highest F-measure score is 84% for extraction rate at 30%, in droughts post groups. In the third experiment the highest F-measure score is 91.37% for extraction rate at 30%, in the sports post groups and also the fourth experiments the highest F-measure score is 93.52% for extraction rate at 30% to generate the summary post texts. If the system to generate the size of summary is increased, the extraction rate also increased in posted texts. For this the evaluation system shown that a very good results to summaries the posted texts on social media.},
 year = {2021}
}

Copy | Download

TY  - JOUR
T1  - Amharic Text Summarization for News Items Posted on Social Media
AU  - Abaynew Guadie
AU  - Debela Tesfaye
AU  - Teferi Kebebew
Y1  - 2021/12/24
PY  - 2021
N1  - https://doi.org/10.11648/j.ijiis.20211006.14
DO  - 10.11648/j.ijiis.20211006.14
T2  - International Journal of Intelligent Information Systems
JF  - International Journal of Intelligent Information Systems
JO  - International Journal of Intelligent Information Systems
SP  - 125
EP  - 138
PB  - Science Publishing Group
SN  - 2328-7683
UR  - https://doi.org/10.11648/j.ijiis.20211006.14
AB  - This paper introduces Amharic Text Summarization for News Items posted on social media, to summarize the news items posted Amharic texts over a time posted documents from social media on Twitter and Facebook; The main problems of the social media posted texts are that most people would probably read their posted in Amharic texts with duplicate posted documents. However, to find the information the user is looking for she or he will have to find summary posted texts and read important portions of posts as Amharic documents to extract desired information on social media. Summarization is dealing with information overload presenting and posted with a text document for the current time representation of the posted documents to summarize. Our proposed approach has three main components: First, calculate the similarity between each posted document within the two pair of sentences. Second, clustering based on the similarity results of the documents to group them by using Kmeans algorithm. Third, summarizing the clustered posted document individually using TF-IDF algorithms that involve finding statistical ways for the frequent terms to rank the documents. We applied the summarization technique is an extractive summarization approach that is assigned an extract the sentences with highest ranked sentences in the posted documents to form the summaries and the size of the summary can be identified by the user. In the experiment one the highest F-measure score is 87.07% for extraction rate at 30%, in the clustered group of protests posts. The second experiment the highest F-measure score is 84% for extraction rate at 30%, in droughts post groups. In the third experiment the highest F-measure score is 91.37% for extraction rate at 30%, in the sports post groups and also the fourth experiments the highest F-measure score is 93.52% for extraction rate at 30% to generate the summary post texts. If the system to generate the size of summary is increased, the extraction rate also increased in posted texts. For this the evaluation system shown that a very good results to summaries the posted texts on social media.
VL  - 10
IS  - 6
ER  -

Copy | Download