Supervised learning with word embeddings derived from PubMed captures latent knowledge about protein kinases and cancer.

Document Type

Article

Publication Date

12-1-2021

Publication Title

NAR Genom Bioinform

JAX Source

NAR Genom Bioinform 2021 Dec 8; 3(4):lqab113

Volume

3

Issue

4

First Page

113

Last Page

113

ISSN

2631-9268

PMID

34888523

DOI

https://doi.org/10.1093/nargab/lqab113

Grant

CA239108, CA034196, CA224370

Abstract

Inhibiting protein kinases (PKs) that cause cancers has been an important topic in cancer therapy for years. So far, almost 8% of >530 PKs have been targeted by FDA-approved medications, and around 150 protein kinase inhibitors (PKIs) have been tested in clinical trials. We present an approach based on natural language processing and machine learning to investigate the relations between PKs and cancers, predicting PKs whose inhibition would be efficacious to treat a certain cancer. Our approach represents PKs and cancers as semantically meaningful 100-dimensional vectors based on word and concept neighborhoods in PubMed abstracts. We use information about phase I-IV trials in ClinicalTrials.gov to construct a training set for random forest classification. Our results with historical data show that associations between PKs and specific cancers can be predicted years in advance with good accuracy. Our tool can be used to predict the relevance of inhibiting PKs for specific cancers and to support the design of well-focused clinical trials to discover novel PKIs for cancer therapy.

Comments

This is an Open Access article distributed under the terms of the Creative Commons Attribution License

Recommended Citation

Ravanmehr V, Blau H, Cappelletti L, Fontana T, Carmody L, Coleman B, George J, Reese J, Joachimiak M, Bocci G, Hansen P, Bult C, Rueter J, Casiraghi E, Valentini G, Mungall C, Oprea T, Robinson P. Supervised learning with word embeddings derived from PubMed captures latent knowledge about protein kinases and cancer. NAR Genom Bioinform 2021 Dec 8; 3(4):lqab113

Faculty Research 2021

Supervised learning with word embeddings derived from PubMed captures latent knowledge about protein kinases and cancer.

Document Type

Publication Date

Publication Title

JAX Source

Volume

Issue

First Page

Last Page

ISSN

PMID

DOI

Grant

Abstract

Comments

Recommended Citation

Included in

Search

Browse

Links

Faculty Research 2021

Supervised learning with word embeddings derived from PubMed captures latent knowledge about protein kinases and cancer.

Authors

Document Type

Publication Date

Publication Title

JAX Source

Volume

Issue

First Page

Last Page

ISSN

PMID

DOI

Grant

Abstract

Comments

Recommended Citation

Included in

Share

Search

Browse

Links