ALCESTE in Basque: adaptation and evaluation

Authors

  • Juan UPV/EHU
  • Naia Eguskiza Sanchez UPV/EHU

DOI:

https://doi.org/10.26876/uztaro.130.2024.5073

Keywords:

Reinert method, Iramuteq, Basque lexicon, Research tool, Semantic domains

Abstract

This article presents the adaptation and evaluation of the ALCESTE method and the Iramuteq software, which is currently used for this method, for Basque texts. The methodology for the automated classification of large text volumes offers tools to identify semantic domains. The article focuses on the lexicon adaptation process and both internal and external evaluations. For the internal evaluation, self-descriptions of faculty members from the UPV/EHU education faculties were used; for the external evaluation, a multilingual parallel corpus of Saint Paul's letters from the New Testament was analyzed in Basque, Spanish, English, and French.

Downloads

Download data is not yet available.

References

Abasolo, J., eta Eguskiza, N. (2022). Euskarazko lexikoa iramuteqerako. Open Science Framework. https://doi.org/10.17605/OSF.IO/T8JEVX

Aranzabe, M.J., Atutxa, A., Bengoetxea, K., Diaz de Ilarraza, A., Goenaga, I., Gojenola, K., eta Uria, L. (2015). Automatic Conversion of the Basque Dependency Treebank to Universal Dependencies. In M. Dickinsons, E. Hinrichs, A. Patejuk eta A. Przepiórkowski (argtz.), Proceedings of the Fourteenth International Workshop on Treebanks an Linguistic Theories (TLT14) (233-241. or.). Institute of Computer Science of the Polish Academy of Sciences.

Baril, E., & Garnier, B. (2015). Utilisation d’un outil de statistiques textuelles. Institut National d’Etudes Démographiques. http://iramuteq.org/documentation/fichiers/Pas%20a%20Pas%20IRAMUTEQ_0.7alpha2.pdf

Beaudouin, V. (2016). Retour aux origines de la statistique textuelle: Benzécri et l’école française d’analyse des données. In D. Mayaffre, C. Poudat, & L. Vanni (Arg.), JADT 2016 (17-27. or.). al-01376938. https://hal.science/hal-01376938v1

Benzécri, J.-P. (1981). Pratique de l’analyse des donnees: Linguistique et lexicologie. Dunod.

Borko, H. (1965). A Factor Analytically Derived Classification System for Psychological Reports. Perceptual and Motor Skills, 20(2), 393-406. https://doi.org/10.2466/pms.1965.20.2.393 DOI: https://doi.org/10.2466/pms.1965.20.2.393

Hanon, S. (1991). 165. La concordance. Wörterbücher: Ein internationales Handbuch zur Lexikographie, 2, 1.562-1.567. https://doi.org/10.1515/9783110124200.2X

Ideia [@ideiainova]. (2017). Sharing a new version of the Spanish dictionary for #Iramuteq (+500k entries) [Tweet [Link a Archivo]]. In Twitter.

Ihaka, R., & Gentleman, R. (1996). R: A Language for Data Analysis and Graphics. Journal of Computational and Graphical Statistics, 5(3), 299-314. https://doi.org/gddc3nX DOI: https://doi.org/10.1080/10618600.1996.10474713

Lelorain, S., Tessier, P., Florin, A., & Bonnaud-Antignac, A. (2012). Posttraumatic growth in long term breast cancer survivors: Relation to coping, social support and cognitive processing. Journal of Health Psychology, 17(5), 627-639. https://doi.org/10.1177/1359105311427475 DOI: https://doi.org/10.1177/1359105311427475

Loubere, L. (2023). Re: [Iramuteq-users] Dictionary in german? | iramuteq.

Navarro, G., & Idoiaga, N. (2021). Bertso-eskolak, nerabezaroan hezteko espazio gisa. Uztaro: giza eta gizarte-zientzien aldizkaria, Uztaro, 117, 75-90. https://doi.org/10.26876/uztaro.117.2021.4 DOI: https://doi.org/10.26876/uztaro.117.2021.4

Nivre, J., de Marneffe, M.-C., Ginter, F., Goldberg, Y., Hajič, J., Manning, C. D., McDonald, R., Petrov, S., Pyysalo, S., Silveira, N., Tsarfaty, R., & Zeman, D. (2016). Universal Dependencies v1: A Multilingual Treebank Collection. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16) (1.659-1.666. or.).

R Core Team. (2020). R: A language and environment for statistical computing [Manual]. R Foundation for Statistical Computing.

Rastier, F. (1987). Représentation Du Contenu Lexical Et Formalismes De L’intelligence Artificielle. Langages, 87, 79-102. https://doi.org/10.3406/lgge.1987.1964X DOI: https://doi.org/10.3406/lgge.1987.1964

Ratinaud, P. (2014). IRaMuTeQ: Interface de R pour les Analyses Multidimensionnelles de Textes et de Questionnaires.

Ratinaud, P., & Déjean, S. (2009). IRaMuTeQ: Implémentation de la méthode ALCESTE d’analyse de texte dans un logiciel libre. Modélisation Appliquée Aux Sciences Humaines Et Sociales MASHS (8-9. or.).

Reinert, A. (1983). Une méthode de classification descendante hiérarchique : application à l’analyse lexicale par contexte. Les cahiers de l’analyse des données, 8(2), 187-198.

Reinert, M. (1986). Un logiciel d’analyse lexicale. Les Cahiers de l’analyse Des Données, 11(4), 471-481.

Reinert, M. (1990). Alceste une méthodologie d’analyse des données textuelles et une application: Aurelia De Gerard De Nerval. Bulletin of Sociological Methodology/Bulletin de Méthodologie Sociologique, 26(1), 24-54. https://doi.org/cbhfwpX DOI: https://doi.org/10.1177/075910639002600103

Schonhardt-Bailey, C., & Bailey, A. (2013). Deliberating American Monetary Policy: A Textual Analysis. The MIT Press. https://www.jstor.org/stable/j.ctt9qf5r7X DOI: https://doi.org/10.7551/mitpress/9780262019576.001.0001

Sobczak, A., Debucquet, G., & Havard, C. (2006). The impact of higher education on students’ and young managers’ perception of companies and CSR: An exploratory analysis. Corporate Governance, 6(4), 463-474. https://doi.org/10.1108/14720700610689577 DOI: https://doi.org/10.1108/14720700610689577

Straka, M., Hajič, J., & Straková, J. (2016). UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), 4.290-4.297.

Trigo, A., Marta-Costa, A., & Fragoso, R. (2021). Principles of sustainable agriculture: Defining standardized reference points. Sustainability (Switzerland), 13(8). Scopus. https://doi.org/10.3390/su13084086 DOI: https://doi.org/10.3390/su13084086

Ward, J.H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236-244. https://doi.org/fz95kgX DOI: https://doi.org/10.1080/01621459.1963.10500845

UZTARO 130, 47-66 66 Bilbo, 2024ko uztaila-iraila

Wijffels, J. (2019). Udpipe: Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the ’UDPipe’ ’NLP’ Toolkit. R package wervion 0.8.2. https://doi.org/10.32614/CRAN.package.udpipe DOI: https://doi.org/10.32614/CRAN.package.udpipe

Downloads

Published

2024-10-15

How to Cite

Juan, & Eguskiza Sanchez, N. (2024). ALCESTE in Basque: adaptation and evaluation. Uztaro. Giza Eta Gizarte-Zientzien Aldizkaria, (130), 47–66. https://doi.org/10.26876/uztaro.130.2024.5073

Issue

Section

Article

Categories