Euskara eta gaztelaniazko kontra-narratiben sorkuntza: datuen sorrera eta ebaluazioa

Authors

  • Jaione Bengoetxea HiTZ Basque Center for Language Technology - Ixa,University of the Basque Country UPV/EHU
  • Itziar Gonzalez-Dios HiTZ Basque Center for Language Technology - Ixa,University of the Basque Country UPV/EHU
  • Rodrigo Agerri HiTZ Basque Center for Language Technology - Ixa,University of the Basque Country UPV/EHU

DOI:

https://doi.org/10.26876/ikergazte.vi.03.16

Keywords:

Counter Narratives, Hate Speech, Multilinguality, Text Generation

Abstract

Counter Narratives (CNs) are non-negative responses to Hate Speech (HS) that help reduce online hatred and its spread. Despite the rise in HS online, research on automatic CN generation remains limited, most works being done in English. This paper introduces CONAN-EUS, a Basque and Spanish dataset for CN generation, created using Machine Translation (MT) and professional post-editing. As a parallel corpus to the English CONAN dataset, it enables research on multilingual and crosslingual CN generation. Experiments with the language model mT5 show that training on post-edited data improves CN generation quality compared to using only MT data. Manual evaluation confirms that manually revising data remains crucial. Multilingual augmentation benefits Spanish but not Basque, highlighting challenges in multilingual generative models. Content Warning: This paper contains examples of offensive language that do not reflect the authors’ views.

Downloads

Published

2025-05-30

How to Cite

Bengoetxea, J., Gonzalez-Dios, I., & Agerri, R. (2025). Euskara eta gaztelaniazko kontra-narratiben sorkuntza: datuen sorrera eta ebaluazioa. IkerGazte. Nazioarteko Ikerketa Euskaraz, 3, 133–140. https://doi.org/10.26876/ikergazte.vi.03.16