Molekulen propietateen iragarpena sare neuronalen bitartez datu-urritasun egoeretan

Authors

  • Amaia Elizaran University of the Basque Country (UPV/EHU); CSIC
  • Gustavo Ariel Schwartz Pomeraniec University of the Basque Country (UPV/EHU); CSIC

DOI:

https://doi.org/10.26876/ikergazte.vi.05.12

Keywords:

Recurrent Neural Network, SMILES, data scarcity

Abstract

We present a Recurrent Neural Network (RNN) that predicts molecular properties only based on the molecular structure. The SMILES representations of the molecular structures are fed into the algorithm as an input. In general, Artificial Neural Networks work well when they have plenty of input data available, but they perform poorly under data scarcity scenarios. In this work, we specially focus on giving a solution to the problem of data scarcity and we have analyzed different approaches to tackle it. Our hypothesis is that training the model with similar data will improve the results. The analyzed similarities are of distinct nature. On the one hand, we have considered string similarities of the SMILES encodings. On the other hand, we have computed the similarities of the feature vectors.

Downloads

Published

2025-05-30

How to Cite

Elizaran, A., & Schwartz Pomeraniec, G. A. (2025). Molekulen propietateen iragarpena sare neuronalen bitartez datu-urritasun egoeretan. IkerGazte. Nazioarteko Ikerketa Euskaraz, 5, 101–108. https://doi.org/10.26876/ikergazte.vi.05.12