Modalitate anitzeko ereduen eta hizkuntza ereduen zentzumen espazialaren azterketa
DOI:
https://doi.org/10.26876/ikergazte.vi.03.20Keywords:
Artificial Intelligence, Spatial Commonsense, Language Models, Multimodal ModelsAbstract
Understanding spatial commonsense, our knowledge about physical space and relationships, is a fundamental aspect of human cognition. This investigation evaluates and compares various current models to determine their understanding of spatial commonsense reasoning. Three types of models are taken into account in this work: text-only models, multimodal models with text inputs, and text-to-image models generating visual representations from text. Spatial commonsense is something very simple to understand for humans, but it has been a tough task for language models in the past. Our research shows that current models have improved significantly at these kinds of tasks. Ultimately, this work contributes to the advancing development of AI systems in reasoning about spatial relationships, which is an essential step toward human-level understanding of the world.
License
Copyright (c) 2025 IkerGazte. Nazioarteko ikerketa euskaraz

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.