Ikusizko Hizkuntza-ereduetan Kanpo Ezagutza eta Arrazoimendu Espaziala Txertatzen
DOI:
https://doi.org/10.26876/ikergazte.vi.03.24Keywords:
Natural language processing, Computer vision, Vision-and-language models, World knowledge integration, Spatial reasoningAbstract
The fields of natural language processing (NLP) and computer vision (CV) have lately emerged. Although the bridge between NLP and CV has also advanced, nowadays these systems still face weaknesses with no trivial solution. In this work, we present the findings of a PhD thesis, where we analyzed two limitations of current Vision-and-language models: world knowledge integration and spatial reasoning. On the one hand, we verbalized images to leverage better world knowledge that is implicitly encoded in language models. On the other hand, we exploited the generation of synthetic data from object annotations to aid the spatial reasoning of both language models and text-to-image generators.
License
Copyright (c) 2025 IkerGazte. Nazioarteko ikerketa euskaraz

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
