Modalitate anitzeko ereduen eta hizkuntza ereduen zentzumen espazialaren azterketa

Authors

  • Oier Ijurco Ixa Taldea - Euskal Herriko Unibertsitatea UPV/EHU
  • Oier Lopez de Lacalle Ixa Taldea - Euskal Herriko Unibertsitatea UPV/EHU
  • Gorka Azkune Ixa Taldea - Euskal Herriko Unibertsitatea UPV/EHU

DOI:

https://doi.org/10.26876/ikergazte.vi.03.20

Keywords:

Artificial Intelligence, Spatial Commonsense, Language Models, Multimodal Models

Abstract

Understanding spatial commonsense, our knowledge about physical space and relationships, is a fundamental aspect of human cognition. This investigation evaluates and compares various current models to determine their understanding of spatial commonsense reasoning. Three types of models are taken into account in this work: text-only models, multimodal models with text inputs, and text-to-image models generating visual representations from text. Spatial commonsense is something very simple to understand for humans, but it has been a tough task for language models in the past. Our research shows that current models have improved significantly at these kinds of tasks. Ultimately, this work contributes to the advancing development of AI systems in reasoning about spatial relationships, which is an essential step toward human-level understanding of the world.

Published

2025-05-30

How to Cite

Ijurco, O., de Lacalle, O. L., & Azkune, G. (2025). Modalitate anitzeko ereduen eta hizkuntza ereduen zentzumen espazialaren azterketa. IkerGazte. Nazioarteko Ikerketa Euskaraz, 3, 165–172. https://doi.org/10.26876/ikergazte.vi.03.20