Text & Information Processing has developed a tool to help the writer and expert on books, novels, fiction, poetry and all kinds of texts, called Morphological Parameterize of Spanish Texts (Try the ParamText TIP). The ParamText TIP analyzes a document and draws interesting statistical information. It draws the vocabulary of the text and performs the morphological analysis of all the words. It provides information of interest and measure the distribution of words according to different criteria. It distinguishes between stopwords and non-stopwords in every analysis. The displayed data are shown in charts and tables, they can be exported to Microsoft Excel for study and further analysis by the user.
The ParamText TIP performs lexical analysis of a text, extracting the number of paragraphs, sentences, words and characters. Also, it draws for each of these groups the number of sentences, words and characters in each paragraph, the word count and character of each sentence and the number of characters in each word. It provides metric information as the frequency of occurrence of words in the text, the center of gravity of the words, the distribution of words according to their first appearance and frequency of use in Spanish. Also, It shows all the vocabulary of the document in a table .
The Paramtext TIP morphologically analyzes text and draws information related to the grammatical categories of words and their morphological inflection. Paramtext TIP does not perform parsing of sentences and, therefore, it does not make morphological disambiguation of the many options that sometimes a word have. It provides a morphological recognition of each word regardless of their function in the sentence. The Text & Information Processing group is working to extract also the grammatical function that corresponds to each word in the sentence.
On the other hand, the ParamText TIP distinguishes in all the analysis and results between words with meaning or semantic sense and stopwords. Initially, the ParamText TIP provides a set of stopwords by default that the user can change at any time according to their interests.
ParamText Tip is the Graduation thesis of Juan Carlos Santana Herrera in Computer Engineering. This project was directed by Francisco Javier Carreras Riudavets and it was provided with the participation in the development of the libraries and syllabification by Zenón Hernández Figueroa and Gustavo Rodríguez Rodríguez.
Carreras-Riudavets, F.; Santana-Herrera, J.C.; Hernández-Figueroa, Z.; Rodríguez-Rodríguez, G. (2011). Parametrizador morfológico de textos - ParamText TIP. Available at https://tulengua.es