Osasun-arloko entitate izendunen etiketatzea

Authors

  • Paula Ontalvilla University of the Basque Country, UPV/EHU
  • Aitziber Atutxa University of the Basque Country, UPV/EHU
  • Maite Oronoz University of the Basque Country, UPV/EHU

DOI:

https://doi.org/10.26876/ikergazte.v.03.12

Keywords:

Named Entity Recognition, language models, Wikidata, medicine

Abstract

This work has a double objective: on the one hand, it identifies named entities using language models based on transformers and, on the other hand, it links the identified clinical entities with the diseases and symptoms of the Wikidata knowledge base. To identify the entities, experiments have been performed on the MedMentions biomedical corpus with a generalpre-trained language mode˜n BERT (BERT small) and two specialised BERTs (BiomedNLP-PubMedBERT and BioBERT). When assessing whether a succession of tokens constitutes a medical entity, an F1 value of 0.819 was obtained, while assessing the specific class to which the entity belongs, an F1 value of 0.62 was obtained. In addition, a recall close to 50% has been achieved in the first attempt to associate Wikidata to known entities using the Levenhstein distance.

Downloads

Published

2023-05-09

How to Cite

Ontalvilla, P., Atutxa, A., & Oronoz, M. (2023). Osasun-arloko entitate izendunen etiketatzea. IkerGazte. Nazioarteko Ikerketa Euskaraz, 3, 91–98. https://doi.org/10.26876/ikergazte.v.03.12