Ikusizko Hizkuntza-ereduetan Kanpo Ezagutza eta Arrazoimendu Espaziala Txertatzen

Authors

  • Ander Salaberria HiTZ Zentroa, Euskal Herriko Unibertsitatea UPV/EHU
  • Gorka Azkune HiTZ Zentroa, Euskal Herriko Unibertsitatea UPV/EHU
  • Eneko Agirre HiTZ Zentroa, Euskal Herriko Unibertsitatea UPV/EHU

DOI:

https://doi.org/10.26876/ikergazte.vi.03.24

Keywords:

Natural language processing, Computer vision, Vision-and-language models, World knowledge integration, Spatial reasoning

Abstract

The fields of natural language processing (NLP) and computer vision (CV) have lately emerged. Although the bridge between NLP and CV has also advanced, nowadays these systems still face weaknesses with no trivial solution. In this work, we present the findings of a PhD thesis, where we analyzed two limitations of current Vision-and-language models: world knowledge integration and spatial reasoning. On the one hand, we verbalized images to leverage better world knowledge that is implicitly encoded in language models. On the other hand, we exploited the generation of synthetic data from object annotations to aid the spatial reasoning of both language models and text-to-image generators.

Downloads

Published

2025-05-30

How to Cite

Salaberria, A., Azkune, G., & Agirre, E. (2025). Ikusizko Hizkuntza-ereduetan Kanpo Ezagutza eta Arrazoimendu Espaziala Txertatzen. IkerGazte. Nazioarteko Ikerketa Euskaraz, 3, 195–202. https://doi.org/10.26876/ikergazte.vi.03.24