ALCESTE in Basque: adaptation and evaluation
DOI:
https://doi.org/10.26876/uztaro.130.2024.5073Keywords:
Reinert method, Iramuteq, Basque lexicon, Research tool, Semantic domainsAbstract
This article presents the adaptation and evaluation of the ALCESTE method and the Iramuteq software, which is currently used for this method, for Basque texts. The methodology for the automated classification of large text volumes offers tools to identify semantic domains. The article focuses on the lexicon adaptation process and both internal and external evaluations. For the internal evaluation, self-descriptions of faculty members from the UPV/EHU education faculties were used; for the external evaluation, a multilingual parallel corpus of Saint Paul's letters from the New Testament was analyzed in Basque, Spanish, English, and French.
Downloads
References
Abasolo, J., eta Eguskiza, N. (2022). Euskarazko lexikoa iramuteqerako. Open Science Framework. https://doi.org/10.17605/OSF.IO/T8JEVX
Aranzabe, M.J., Atutxa, A., Bengoetxea, K., Diaz de Ilarraza, A., Goenaga, I., Gojenola, K., eta Uria, L. (2015). Automatic Conversion of the Basque Dependency Treebank to Universal Dependencies. In M. Dickinsons, E. Hinrichs, A. Patejuk eta A. Przepiórkowski (argtz.), Proceedings of the Fourteenth International Workshop on Treebanks an Linguistic Theories (TLT14) (233-241. or.). Institute of Computer Science of the Polish Academy of Sciences.
Baril, E., & Garnier, B. (2015). Utilisation d’un outil de statistiques textuelles. Institut National d’Etudes Démographiques. http://iramuteq.org/documentation/fichiers/Pas%20a%20Pas%20IRAMUTEQ_0.7alpha2.pdf
Beaudouin, V. (2016). Retour aux origines de la statistique textuelle: Benzécri et l’école française d’analyse des données. In D. Mayaffre, C. Poudat, & L. Vanni (Arg.), JADT 2016 (17-27. or.). al-01376938. https://hal.science/hal-01376938v1
Benzécri, J.-P. (1981). Pratique de l’analyse des donnees: Linguistique et lexicologie. Dunod.
Borko, H. (1965). A Factor Analytically Derived Classification System for Psychological Reports. Perceptual and Motor Skills, 20(2), 393-406. https://doi.org/10.2466/pms.1965.20.2.393 DOI: https://doi.org/10.2466/pms.1965.20.2.393
Hanon, S. (1991). 165. La concordance. Wörterbücher: Ein internationales Handbuch zur Lexikographie, 2, 1.562-1.567. https://doi.org/10.1515/9783110124200.2X
Ideia [@ideiainova]. (2017). Sharing a new version of the Spanish dictionary for #Iramuteq (+500k entries) [Tweet [Link a Archivo]]. In Twitter.
Ihaka, R., & Gentleman, R. (1996). R: A Language for Data Analysis and Graphics. Journal of Computational and Graphical Statistics, 5(3), 299-314. https://doi.org/gddc3nX DOI: https://doi.org/10.1080/10618600.1996.10474713
Lelorain, S., Tessier, P., Florin, A., & Bonnaud-Antignac, A. (2012). Posttraumatic growth in long term breast cancer survivors: Relation to coping, social support and cognitive processing. Journal of Health Psychology, 17(5), 627-639. https://doi.org/10.1177/1359105311427475 DOI: https://doi.org/10.1177/1359105311427475
Loubere, L. (2023). Re: [Iramuteq-users] Dictionary in german? | iramuteq.
Navarro, G., & Idoiaga, N. (2021). Bertso-eskolak, nerabezaroan hezteko espazio gisa. Uztaro: giza eta gizarte-zientzien aldizkaria, Uztaro, 117, 75-90. https://doi.org/10.26876/uztaro.117.2021.4 DOI: https://doi.org/10.26876/uztaro.117.2021.4
Nivre, J., de Marneffe, M.-C., Ginter, F., Goldberg, Y., Hajič, J., Manning, C. D., McDonald, R., Petrov, S., Pyysalo, S., Silveira, N., Tsarfaty, R., & Zeman, D. (2016). Universal Dependencies v1: A Multilingual Treebank Collection. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16) (1.659-1.666. or.).
R Core Team. (2020). R: A language and environment for statistical computing [Manual]. R Foundation for Statistical Computing.
Rastier, F. (1987). Représentation Du Contenu Lexical Et Formalismes De L’intelligence Artificielle. Langages, 87, 79-102. https://doi.org/10.3406/lgge.1987.1964X DOI: https://doi.org/10.3406/lgge.1987.1964
Ratinaud, P. (2014). IRaMuTeQ: Interface de R pour les Analyses Multidimensionnelles de Textes et de Questionnaires.
Ratinaud, P., & Déjean, S. (2009). IRaMuTeQ: Implémentation de la méthode ALCESTE d’analyse de texte dans un logiciel libre. Modélisation Appliquée Aux Sciences Humaines Et Sociales MASHS (8-9. or.).
Reinert, A. (1983). Une méthode de classification descendante hiérarchique : application à l’analyse lexicale par contexte. Les cahiers de l’analyse des données, 8(2), 187-198.
Reinert, M. (1986). Un logiciel d’analyse lexicale. Les Cahiers de l’analyse Des Données, 11(4), 471-481.
Reinert, M. (1990). Alceste une méthodologie d’analyse des données textuelles et une application: Aurelia De Gerard De Nerval. Bulletin of Sociological Methodology/Bulletin de Méthodologie Sociologique, 26(1), 24-54. https://doi.org/cbhfwpX DOI: https://doi.org/10.1177/075910639002600103
Schonhardt-Bailey, C., & Bailey, A. (2013). Deliberating American Monetary Policy: A Textual Analysis. The MIT Press. https://www.jstor.org/stable/j.ctt9qf5r7X DOI: https://doi.org/10.7551/mitpress/9780262019576.001.0001
Sobczak, A., Debucquet, G., & Havard, C. (2006). The impact of higher education on students’ and young managers’ perception of companies and CSR: An exploratory analysis. Corporate Governance, 6(4), 463-474. https://doi.org/10.1108/14720700610689577 DOI: https://doi.org/10.1108/14720700610689577
Straka, M., Hajič, J., & Straková, J. (2016). UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), 4.290-4.297.
Trigo, A., Marta-Costa, A., & Fragoso, R. (2021). Principles of sustainable agriculture: Defining standardized reference points. Sustainability (Switzerland), 13(8). Scopus. https://doi.org/10.3390/su13084086 DOI: https://doi.org/10.3390/su13084086
Ward, J.H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236-244. https://doi.org/fz95kgX DOI: https://doi.org/10.1080/01621459.1963.10500845
UZTARO 130, 47-66 66 Bilbo, 2024ko uztaila-iraila
Wijffels, J. (2019). Udpipe: Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the ’UDPipe’ ’NLP’ Toolkit. R package wervion 0.8.2. https://doi.org/10.32614/CRAN.package.udpipe DOI: https://doi.org/10.32614/CRAN.package.udpipe
License
Copyright (c) 2024 Juan Abasolo, Naia Eguskiza Sanchez
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.