Euskara eta gaztelaniazko kontra-narratiben sorkuntza: datuen sorrera eta ebaluazioa
DOI:
https://doi.org/10.26876/ikergazte.vi.03.16Keywords:
Counter Narratives, Hate Speech, Multilinguality, Text GenerationAbstract
Counter Narratives (CNs) are non-negative responses to Hate Speech (HS) that help reduce online hatred and its spread. Despite the rise in HS online, research on automatic CN generation remains limited, most works being done in English. This paper introduces CONAN-EUS, a Basque and Spanish dataset for CN generation, created using Machine Translation (MT) and professional post-editing. As a parallel corpus to the English CONAN dataset, it enables research on multilingual and crosslingual CN generation. Experiments with the language model mT5 show that training on post-edited data improves CN generation quality compared to using only MT data. Manual evaluation confirms that manually revising data remains crucial. Multilingual augmentation benefits Spanish but not Basque, highlighting challenges in multilingual generative models. Content Warning: This paper contains examples of offensive language that do not reflect the authors’ views.
License
Copyright (c) 2025 IkerGazte. Nazioarteko ikerketa euskaraz

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
