International Journal of Intelligent Information Systems

Submit a Manuscript

Publishing with us to make your research visible to the widest possible audience.

Propose a Special Issue

Building a community of authors and readers to discuss the latest research and develop new ideas.

Extracting Structured Data from Text in Natural Language

Nowadays, the amount of information in the web is tremendous. Big part of it is presented as articles, descriptions, posts and comments i.e. free text in natural language and it is really hard to make use of it while it is in this format. Whereas, in the structured form it could be used for a lot of purposes. So, the main idea that this paper proposes is an approach for extracting data which is given as a free text in natural language into a structured data for example table. The structured information is easy to search and analyze. The structured data is quantitative, while the unstructured data is qualitative. Overall such tool that enables conversion of a text into a structured data will not only provide automatic mechanism for data extraction but will also save a lot of resources for processing and storing of the extracted data. The data extraction from text will also provide automation of the process of extracting useful insights from data that is usually processed by people. The efficiency of the process as well as its accuracy will increase and the probability of human error will be minimized. The amount of the processed data will no longer be limited by the human resources.

Data Extraction, Structured Data, Unstructured Data, Automation, NLP, RASA

Zheni Mincheva, Nikola Vasilev, Ventsislav Nikolov, Anatoliy Antonov. (2021). Extracting Structured Data from Text in Natural Language. International Journal of Intelligent Information Systems, 10(4), 74-80.

Copyright © 2021 Authors retain the copyright of this article.
This article is an open access article distributed under the Creative Commons Attribution License ( which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. Holst A. (2021, June 30). Amount of data created, consumed, and stored 2010-2025.
2. Bocklisch T., Faulkner J., Pawlowski N., Nichol A. (2017). Rasa: Open Source Language Understanding and Dialogue Management.
3. Petrov. C. (2021, June 30). 25+ Impressive Big Data Statistics for 2021.
4. Taylor. C. (2021, June 30). Structured vs. Unstructured Data.
5. Lomotey RK, Deters R. RSenter: terms mining tool from unstructured data sources. Int J Bus Process Integr Manag. 2013; 6 (4): 298.
6. Gantz J, Reinsel D. The digital universe in 2020: big data, bigger digital shadows, and biggest growth in the far east. IDC iView IDC Analyze Future. 2012; 2007 (2012): 1–16.
7. Jiao, A. (2020). An intelligent Chatbot system based on entity extraction USING Rasa NLU and neural network. Journal of Physics: Conference Series, 1487.
8. Bagchi, M. (2020). Conceptualising a Library chatbot using open Source Conversational artificial intelligence. DESIDOC Journal of Library & Information Technology.
9. RASA. (2020, July 27) Introducing DIET: state-of-the-art architecture that outperforms fine-tuning BERT and is 6X faster to train.
10. Wochinger, T. (2019, June 4). Rasa NLU in DEPTH: INTENT CLASSIFICATION. The Rasa Blog: Conversational AI Platform, Powered by Open Source.
11. Wochinger, T. (2019, June 4). Rasa NLU in DEPTH: Entity recognition. The Rasa Blog: Conversational AI Platform, Powered by Open Source.
12. Baldauf, Matthias & Dustdar, Schahram & Rosenberg, Florian. (2007). A Survey on context-aware systems. Information Systems. 2. 10.1504/IJAHUC.2007.014070.
13. Zola, A. (2021, March 31). The 5 best programming languages for AI. Springboard Blog.
14. Mendonca, Sandro & Brito, Yvan & Santos, Carlos & Lima, Rodrigo & Araujo, Tiago & Meiguins, Bianchi. (2020). Synthetic Datasets Generator for Testing Information Visualization and Machine Learning Techniques and Tools. IEEE Access. PP. 1-1. 10.1109/ACCESS.2020.2991949.
15. Wrembel, Robert, and Christian Koncilia. Data Warehouses and Olap: Concepts, Architectures, and Solutions. IRM Press, 2007.
16. spaCy · INDUSTRIAL-STRENGTH natural language processing in Python. · Industrial-strength Natural Language Processing in Python. (2020, July 30).
17. Loper, E., & Bird, S. Nltk: The natural Language Toolkit.
18. Popić, Srđan & Velikic, Ivan & Teslic, Nikola & Pavkovic, Bogdan. (2019). Data generators: a short survey of techniques and use cases with focus on testing. 10.1109/ICCE-Berlin47944.2019.8966202.
19. G. Albuquerque, T. Lowe and M. Magnor, "Synthetic Generation of High-Dimensional Datasets," in IEEE Transactions on Visualization and Computer Graphics, vol. 17, no. 12, pp. 2317-2324, Dec. 2011, doi: 10.1109/TVCG.2011.237.
20. Rajman M., Besançon R. (1998) Text Mining: Natural Language techniques and Text Mining applications. In: Spaccapietra S., Maryanski F. (eds) Data Mining and Reverse Engineering. IFIP — The International Federation for Information Processing. Springer, Boston, MA.
21. Hotho, Andreas & Nürnberger, Andreas & Paass, Gerhard. (2005). A Brief Survey of Text Mining. LDV Forum - GLDV Journal for Computational Linguistics and Language Technology. 20. 19-62.
22. Gupta, Vishal & Lehal, Gurpreet. (2009). A Survey of Text Mining Techniques and Applications. Journal of Emerging Technologies in Web Intelligence. 1. 10.4304/jetwi.1.1.60-76.