Developing Natural Language Processing tools: Lemmatizer, disambiguator, named entity recognizer, and parser

Abstract: This project was successfully completed in December 2009, and developed a basic natural language processing tool system, i.e., stemmer, disambiguator, named entity recognizer, and parser. The morphological analyzer recognizes a total of 14,205 nouns, 21,704 adjectives and 10,760 verbs, as well as words within other grammatical categories, i.e., pronouns, adverbs, etc., from the Spanish spoken in Cuba, based on their frequency of usage. The parser widens the set of rules in FreeLing in order to add some dependency relations between lexical units not included in the original tool. The conducted experiments show a 10% improvement as compared to the original version of FreeLing. Additionally, the system includes a method to build deep parse trees through shifts in superficial parse trees based on dependency trees. The proposed tool will be useful for linguistic studies and applications that require natural language processing.

Project Director: Dr. Leonel Ruiz Miyares

Project Codirector: Dra. Aurora Pons Porrata

Members: Yunior Ramírez Cruz, MSc, René A. Viant Morán, BSc, Carlos A. Fernández Cairó, BSc, Camilo Acosta Arafet, BSc, Jorge A. Ríos García, BSc, Yamila Cobos Castillo, BSc, Dr. Eloína Miyares Bermúdez, Dr. Reynaldo J. Gil García, technician María Rosa Álvarez Silva, Mileidis Quintana Polanco, MSc, Lisette García Moya, MSc, and Henry Anaya Sánchez, BSc