Ikusizko Hizkuntza-ereduetan Kanpo Ezagutza eta Arrazoimendu Espaziala Txertatzen

Ander Salaberria; Gorka Azkune; Eneko Agirre

doi:10.26876/ikergazte.vi.03.24

Authors

Ander Salaberria HiTZ Zentroa, Euskal Herriko Unibertsitatea UPV/EHU
Gorka Azkune HiTZ Zentroa, Euskal Herriko Unibertsitatea UPV/EHU
Eneko Agirre HiTZ Zentroa, Euskal Herriko Unibertsitatea UPV/EHU

DOI:

https://doi.org/10.26876/ikergazte.vi.03.24

Keywords:

Natural language processing, Computer vision, Vision-and-language models, World knowledge integration, Spatial reasoning

Abstract

The fields of natural language processing (NLP) and computer vision (CV) have lately emerged. Although the bridge between NLP and CV has also advanced, nowadays these systems still face weaknesses with no trivial solution. In this work, we present the findings of a PhD thesis, where we analyzed two limitations of current Vision-and-language models: world knowledge integration and spatial reasoning. On the one hand, we verbalized images to leverage better world knowledge that is implicitly encoded in language models. On the other hand, we exploited the generation of synthetic data from object annotations to aid the spatial reasoning of both language models and text-to-image generators.

License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Ikusizko Hizkuntza-ereduetan Kanpo Ezagutza eta Arrazoimendu Espaziala Txertatzen

Authors

DOI:

Keywords:

Abstract

License

Downloads

Published

How to Cite

Conference Proceedings Volume

Section

Categories

eISSN-zutabe

Language

BAIONAKO EGOITZA SOZIALA

BILBOKO EGOITZA SOZIALA

EIBARKO EGOITZA AKADEMIKOA

IRUÑEKO EGOITZA SOZIALA