Extracting Structured Data from Text in Natural Language

Zheni Mincheva; Nikola Vasilev; Ventsislav Nikolov; Anatoliy Antonov

doi:doi:10.11648/j.ijiis.20211004.16

| Peer-Reviewed

Extracting Structured Data from Text in Natural Language

Zheni Mincheva, Nikola Vasilev, Ventsislav Nikolov, Anatoliy Antonov

Published in International Journal of Intelligent Information Systems (Volume 10, Issue 4)

Received: 6 August 2021 Accepted: 20 August 2021 Published: 31 August 2021

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

Nowadays, the amount of information in the web is tremendous. Big part of it is presented as articles, descriptions, posts and comments i.e. free text in natural language and it is really hard to make use of it while it is in this format. Whereas, in the structured form it could be used for a lot of purposes. So, the main idea that this paper proposes is an approach for extracting data which is given as a free text in natural language into a structured data for example table. The structured information is easy to search and analyze. The structured data is quantitative, while the unstructured data is qualitative. Overall such tool that enables conversion of a text into a structured data will not only provide automatic mechanism for data extraction but will also save a lot of resources for processing and storing of the extracted data. The data extraction from text will also provide automation of the process of extracting useful insights from data that is usually processed by people. The efficiency of the process as well as its accuracy will increase and the probability of human error will be minimized. The amount of the processed data will no longer be limited by the human resources.

Published in	International Journal of Intelligent Information Systems (Volume 10, Issue 4)
DOI	10.11648/j.ijiis.20211004.16
Page(s)	74-80
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2021. Published by Science Publishing Group

Keywords

Data Extraction, Structured Data, Unstructured Data, Automation, NLP, RASA

References

[1]	Holst A. (2021, June 30). Amount of data created, consumed, and stored 2010-2025. https://www.statista.com/statistics/871513/worldwide-data-created/
[2]	Bocklisch T., Faulkner J., Pawlowski N., Nichol A. (2017). Rasa: Open Source Language Understanding and Dialogue Management.
[3]	Petrov. C. (2021, June 30). 25+ Impressive Big Data Statistics for 2021. https://techjury.net/blog/big-data-statistics/#gref
[4]	Taylor. C. (2021, June 30). Structured vs. Unstructured Data. https://www.datamation.com/big-data/structured-vs-unstructured-data/
[5]	Lomotey RK, Deters R. RSenter: terms mining tool from unstructured data sources. Int J Bus Process Integr Manag. 2013; 6 (4): 298.
[6]	Gantz J, Reinsel D. The digital universe in 2020: big data, bigger digital shadows, and biggest growth in the far east. IDC iView IDC Analyze Future. 2012; 2007 (2012): 1–16.
[7]	Jiao, A. (2020). An intelligent Chatbot system based on entity extraction USING Rasa NLU and neural network. Journal of Physics: Conference Series, 1487.
[8]	Bagchi, M. (2020). Conceptualising a Library chatbot using open Source Conversational artificial intelligence. DESIDOC Journal of Library & Information Technology.
[9]	RASA. (2020, July 27) Introducing DIET: state-of-the-art architecture that outperforms fine-tuning BERT and is 6X faster to train. https://blog.rasa.com/introducing-dual-intent-andentity-transformer-diet-state-of-the-art-performanceon-a-lightweight-architecture/.
[10]	Wochinger, T. (2019, June 4). Rasa NLU in DEPTH: INTENT CLASSIFICATION. The Rasa Blog: Conversational AI Platform, Powered by Open Source. https://blog.rasa.com/rasa-nlu-in-depth-part-1-intent-classification/.
[11]	Wochinger, T. (2019, June 4). Rasa NLU in DEPTH: Entity recognition. The Rasa Blog: Conversational AI Platform, Powered by Open Source. https://blog.rasa.com/rasa-nlu-in-depth-part-2-entity-recognition/.
[12]	Baldauf, Matthias & Dustdar, Schahram & Rosenberg, Florian. (2007). A Survey on context-aware systems. Information Systems. 2. 10.1504/IJAHUC.2007.014070.
[13]	Zola, A. (2021, March 31). The 5 best programming languages for AI. Springboard Blog. https://www.springboard.com/blog/ai-machine-learning/best-programming-language-for-ai/.
[14]	Mendonca, Sandro & Brito, Yvan & Santos, Carlos & Lima, Rodrigo & Araujo, Tiago & Meiguins, Bianchi. (2020). Synthetic Datasets Generator for Testing Information Visualization and Machine Learning Techniques and Tools. IEEE Access. PP. 1-1. 10.1109/ACCESS.2020.2991949.
[15]	Wrembel, Robert, and Christian Koncilia. Data Warehouses and Olap: Concepts, Architectures, and Solutions. IRM Press, 2007.
[16]	spaCy · INDUSTRIAL-STRENGTH natural language processing in Python. · Industrial-strength Natural Language Processing in Python. (2020, July 30). https://spacy.io/.
[17]	Loper, E., & Bird, S. Nltk: The natural Language Toolkit.
[18]	Popić, Srđan & Velikic, Ivan & Teslic, Nikola & Pavkovic, Bogdan. (2019). Data generators: a short survey of techniques and use cases with focus on testing. 10.1109/ICCE-Berlin47944.2019.8966202.
[19]	G. Albuquerque, T. Lowe and M. Magnor, "Synthetic Generation of High-Dimensional Datasets," in IEEE Transactions on Visualization and Computer Graphics, vol. 17, no. 12, pp. 2317-2324, Dec. 2011, doi: 10.1109/TVCG.2011.237.
[20]	Rajman M., Besançon R. (1998) Text Mining: Natural Language techniques and Text Mining applications. In: Spaccapietra S., Maryanski F. (eds) Data Mining and Reverse Engineering. IFIP — The International Federation for Information Processing. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-35300-5_3
[21]	Hotho, Andreas & Nürnberger, Andreas & Paass, Gerhard. (2005). A Brief Survey of Text Mining. LDV Forum - GLDV Journal for Computational Linguistics and Language Technology. 20. 19-62.
[22]	Gupta, Vishal & Lehal, Gurpreet. (2009). A Survey of Text Mining Techniques and Applications. Journal of Emerging Technologies in Web Intelligence. 1. 10.4304/jetwi.1.1.60-76.

Cite This Article

Plain Text BibTeX RIS

APA Style

Zheni Mincheva, Nikola Vasilev, Ventsislav Nikolov, Anatoliy Antonov. (2021). Extracting Structured Data from Text in Natural Language. International Journal of Intelligent Information Systems, 10(4), 74-80. https://doi.org/10.11648/j.ijiis.20211004.16

Copy | Download

ACS Style

Zheni Mincheva; Nikola Vasilev; Ventsislav Nikolov; Anatoliy Antonov. Extracting Structured Data from Text in Natural Language. Int. J. Intell. Inf. Syst. 2021, 10(4), 74-80. doi: 10.11648/j.ijiis.20211004.16

Copy | Download

AMA Style

Zheni Mincheva, Nikola Vasilev, Ventsislav Nikolov, Anatoliy Antonov. Extracting Structured Data from Text in Natural Language. Int J Intell Inf Syst. 2021;10(4):74-80. doi: 10.11648/j.ijiis.20211004.16

Copy | Download

@article{10.11648/j.ijiis.20211004.16,
  author = {Zheni Mincheva and Nikola Vasilev and Ventsislav Nikolov and Anatoliy Antonov},
  title = {Extracting Structured Data from Text in Natural Language},
  journal = {International Journal of Intelligent Information Systems},
  volume = {10},
  number = {4},
  pages = {74-80},
  doi = {10.11648/j.ijiis.20211004.16},
  url = {https://doi.org/10.11648/j.ijiis.20211004.16},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijiis.20211004.16},
  abstract = {Nowadays, the amount of information in the web is tremendous. Big part of it is presented as articles, descriptions, posts and comments i.e. free text in natural language and it is really hard to make use of it while it is in this format. Whereas, in the structured form it could be used for a lot of purposes. So, the main idea that this paper proposes is an approach for extracting data which is given as a free text in natural language into a structured data for example table. The structured information is easy to search and analyze. The structured data is quantitative, while the unstructured data is qualitative. Overall such tool that enables conversion of a text into a structured data will not only provide automatic mechanism for data extraction but will also save a lot of resources for processing and storing of the extracted data. The data extraction from text will also provide automation of the process of extracting useful insights from data that is usually processed by people. The efficiency of the process as well as its accuracy will increase and the probability of human error will be minimized. The amount of the processed data will no longer be limited by the human resources.},
 year = {2021}
}

Copy | Download

TY - JOUR
T1 - Extracting Structured Data from Text in Natural Language
AU - Zheni Mincheva
AU - Nikola Vasilev
AU - Ventsislav Nikolov
AU - Anatoliy Antonov
Y1 - 2021/08/31
PY - 2021
N1 - https://doi.org/10.11648/j.ijiis.20211004.16
DO - 10.11648/j.ijiis.20211004.16
T2 - International Journal of Intelligent Information Systems
JF - International Journal of Intelligent Information Systems
JO - International Journal of Intelligent Information Systems
SP - 74
EP - 80
PB - Science Publishing Group
SN - 2328-7683
UR - https://doi.org/10.11648/j.ijiis.20211004.16
AB - Nowadays, the amount of information in the web is tremendous. Big part of it is presented as articles, descriptions, posts and comments i.e. free text in natural language and it is really hard to make use of it while it is in this format. Whereas, in the structured form it could be used for a lot of purposes. So, the main idea that this paper proposes is an approach for extracting data which is given as a free text in natural language into a structured data for example table. The structured information is easy to search and analyze. The structured data is quantitative, while the unstructured data is qualitative. Overall such tool that enables conversion of a text into a structured data will not only provide automatic mechanism for data extraction but will also save a lot of resources for processing and storing of the extracted data. The data extraction from text will also provide automation of the process of extracting useful insights from data that is usually processed by people. The efficiency of the process as well as its accuracy will increase and the probability of human error will be minimized. The amount of the processed data will no longer be limited by the human resources.
VL - 10
IS - 4
ER -

Copy | Download

Author Information

Zheni Mincheva

Eurorisk Systems Ltd, Varna, Bulgaria
Nikola Vasilev

Eurorisk Systems Ltd, Varna, Bulgaria
Ventsislav Nikolov

Eurorisk Systems Ltd, Varna, Bulgaria
Anatoliy Antonov

Eurorisk Systems Ltd, Varna, Bulgaria

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Zheni Mincheva, Nikola Vasilev, Ventsislav Nikolov, Anatoliy Antonov. (2021). Extracting Structured Data from Text in Natural Language. International Journal of Intelligent Information Systems, 10(4), 74-80. https://doi.org/10.11648/j.ijiis.20211004.16

Copy | Download

ACS Style

Zheni Mincheva; Nikola Vasilev; Ventsislav Nikolov; Anatoliy Antonov. Extracting Structured Data from Text in Natural Language. Int. J. Intell. Inf. Syst. 2021, 10(4), 74-80. doi: 10.11648/j.ijiis.20211004.16

Copy | Download

AMA Style

Zheni Mincheva, Nikola Vasilev, Ventsislav Nikolov, Anatoliy Antonov. Extracting Structured Data from Text in Natural Language. Int J Intell Inf Syst. 2021;10(4):74-80. doi: 10.11648/j.ijiis.20211004.16

Copy | Download

@article{10.11648/j.ijiis.20211004.16,
  author = {Zheni Mincheva and Nikola Vasilev and Ventsislav Nikolov and Anatoliy Antonov},
  title = {Extracting Structured Data from Text in Natural Language},
  journal = {International Journal of Intelligent Information Systems},
  volume = {10},
  number = {4},
  pages = {74-80},
  doi = {10.11648/j.ijiis.20211004.16},
  url = {https://doi.org/10.11648/j.ijiis.20211004.16},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijiis.20211004.16},
  abstract = {Nowadays, the amount of information in the web is tremendous. Big part of it is presented as articles, descriptions, posts and comments i.e. free text in natural language and it is really hard to make use of it while it is in this format. Whereas, in the structured form it could be used for a lot of purposes. So, the main idea that this paper proposes is an approach for extracting data which is given as a free text in natural language into a structured data for example table. The structured information is easy to search and analyze. The structured data is quantitative, while the unstructured data is qualitative. Overall such tool that enables conversion of a text into a structured data will not only provide automatic mechanism for data extraction but will also save a lot of resources for processing and storing of the extracted data. The data extraction from text will also provide automation of the process of extracting useful insights from data that is usually processed by people. The efficiency of the process as well as its accuracy will increase and the probability of human error will be minimized. The amount of the processed data will no longer be limited by the human resources.},
 year = {2021}
}

Copy | Download