Faculty Research 2021

PhenoTagger: A Hybrid Method for Phenotype Concept Recognition using Human Phenotype Ontology.

Ling Luo
Shankai Yan
Po-Ting Lai
Daniel Veltri
Andrew Oler
Sandhya Xirasagar
Rajarshi Ghosh
Morgan Similuk
Peter N Robinson, The Jackson LaboratoryFollow
Zhiyong Lu

Document Type

Article

Publication Date

1-20-2021

Publication Title

Bioinformatics (Oxford, England)

Keywords

JGM

JAX Source

Bioinformatics 2021; 37(13):1884-1890

Volume

Issue

First Page

1884

Last Page

1890

ISSN

1367-4811

PMID

33471061

DOI

https://doi.org/10.1016/j.xgen.2021.100029

Abstract

MOTIVATION: Automatic phenotype concept recognition from unstructured text remains a challenging task in biomedical text mining research. Previous works that address the task typically use dictionary-based matching methods, which can achieve high precision but suffer from lower recall. Recently, machine learning-based methods have been proposed to identify biomedical concepts, which can recognize more unseen concept synonyms by automatic feature learning. However, most methods require large corpora of manually annotated data for model training, which is difficult to obtain due to the high cost of human annotation.

RESULTS: In this paper, we propose PhenoTagger, a hybrid method that combines both dictionary and machine learning-based methods to recognize Human Phenotype Ontology (HPO) concepts in unstructured biomedical text. We first use all concepts and synonyms in HPO to construct a dictionary, which is then used to automatically build a distantly supervised training dataset for machine learning. Next, a cutting-edge deep learning model is trained to classify each candidate phrase (n-gram from input sentence) into a corresponding concept label. Finally, the dictionary and machine learning-based prediction results are combined for improved performance. Our method is validated with two HPO corpora, and the results show that PhenoTagger compares favorably to previous methods. In addition, to demonstrate the generalizability of our method, we retrained PhenoTagger using the disease ontology MEDIC for disease concept recognition to investigate the effect of training on different ontologies. Experimental results on the NCBI disease corpus show that PhenoTagger without requiring manually annotated training data achieves competitive performance as compared with state-of-the-art supervised methods.

AVAILABILITY: The source code, API information and data for PhenoTagger are freely available at https://github.com/ncbi-nlp/PhenoTagger.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Recommended Citation

Luo L, Yan S, Lai P, Veltri D, Oler A, Xirasagar S, Ghosh R, Similuk M, Robinson P, Lu Z. PhenoTagger: A Hybrid Method for Phenotype Concept Recognition using Human Phenotype Ontology. Bioinformatics 2021; 37(13):1884-1890

Link to Full Text

COinS

Faculty Research 2021

PhenoTagger: A Hybrid Method for Phenotype Concept Recognition using Human Phenotype Ontology.

Document Type

Publication Date

Publication Title

Keywords

JAX Source

Volume

Issue

First Page

Last Page

ISSN

PMID

DOI

Abstract

Recommended Citation

Search

Browse

Links

Faculty Research 2021

PhenoTagger: A Hybrid Method for Phenotype Concept Recognition using Human Phenotype Ontology.

Authors

Document Type

Publication Date

Publication Title

Keywords

JAX Source

Volume

Issue

First Page

Last Page

ISSN

PMID

DOI

Abstract

Recommended Citation

Share

Search

Browse

Links