Volume 5, Issue 3, June 2016, Page: 42-47
Presenting an Optimal Method for Constructing an English-Persian Comparable Corpus
Seyede Roya Mohammadi, Computer Engineering Department, Alzahra University, Tehran, Iran
Noushin Riahi, Computer Engineering Department, Alzahra University, Tehran, Iran
Received: Mar. 23, 2016;       Accepted: Jun. 7, 2016;       Published: Jun. 18, 2016
DOI: 10.11648/j.ijiis.20160503.12
Multilingual corpora are the main sources in language information retrieval fields. The quality of many researches such as machine translation strongly depends on the quality of these corpora. One of these corpora's is comparable corpus. Considering their quality, these corpora contain broad range of information but constructing them has its special problems which lead to a few numbers of pairs in comparable corpus unlike its large dataset. In this paper we present a new method for increasing the quality and quantity of comparable corpus. We built a Persian-English comparable corpus from two independent news collections: BBC news in English and Hamshahri news in Persian.
Comparable Corpus, Corpus Quality, Hamshahri Corpus, Query, RATF Factor
doi: 10.11648/j.ijiis.20160503.12
