Document Type

Article

Publication Date

12-1-2023

Keywords

JGM, Humans, Algorithms, Sequence Alignment, Language, Electronic Health Records, Publications

JAX Source

Bioinformatics. 2023;39(12).

ISSN

1367-4811

PMID

38001031

DOI

https://doi.org/10.1093/bioinformatics/btad716

Grant

This work was supported by Shriners Children’s Grant [grant number 70904], EMBL-EBI Core Funding, NIH NHGRI [1U24HG011449-01A1], NIH Office of the Director [2R24OD011883-05A1], and the European Union’s Horizon 2020 Research and Innovation Program [grant agreement number 779257] (SOLVE-RD). This research was partly sup- ported by the NIH Intramural Research Program, National Library of Medicine.

Abstract

MOTIVATION: Methods for concept recognition (CR) in clinical texts have largely been tested on abstracts or articles from the medical literature. However, texts from electronic health records (EHRs) frequently contain spelling errors, abbreviations, and other nonstandard ways of representing clinical concepts.

RESULTS: Here, we present a method inspired by the BLAST algorithm for biosequence alignment that screens texts for potential matches on the basis of matching k-mer counts and scores candidates based on conformance to typical patterns of spelling errors derived from 2.9 million clinical notes. Our method, the Term-BLAST-like alignment tool (TBLAT) leverages a gold standard corpus for typographical errors to implement a sequence alignment-inspired method for efficient entity linkage. We present a comprehensive experimental comparison of TBLAT with five widely used tools. Experimental results show an increase of 10% in recall on scientific publications and 20% increase in recall on EHR records (when compared against the next best method), hence supporting a significant enhancement of the entity linking task. The method can be used stand-alone or as a complement to existing approaches.

AVAILABILITY AND IMPLEMENTATION: Fenominal is a Java library that implements TBLAT for named CR of Human Phenotype Ontology terms and is available at https://github.com/monarch-initiative/fenominal under the GNU General Public License v3.0.

Comments

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Share

COinS