Tags

, , , , , , ,

Language Identification has an important role in Natural Language processing applications as one of the pre-processing steps. There
are various mechanisms in use today to achieve this task with brilliant recognition rates.

Recent years have seen rapid growth in international communication which has lead to the requirement of systems capable of
correctly identifying languages of documents. Possible applications of language identification include information retrieval, web
crawlers, text mining and email filtering.

The paper uses a process called G-LDA [1], which takes concepts from Latent Dirichlet Allocation (LDA) and Genetic Evolution
techniques. This involves framing a set of words having a high frequency of occurrence in any given document. The method was tested
on Leipzig Corpora. The phrases that were evolved through the generations reflected significant improvement.