Evaluating machine learning approaches for deciding relevance of ArrayExpress experiments for the Gene Expression Database.
Document Type
Article
Publication Date
Summer 2017
JAX Location
In: Student Reports, Summer 2017, Jackson Laboratory
Sponsor
Dr. James Kadin and Dr. Richard Baldarelli
Abstract
The goal of this study was to design a machine learning model that could accurately identify experiments from ArrayExpress that are relevant to the Gene Expression Database (GXD). Previously curated GXD experiment descriptions served as the training data for several machine learning algorithms whose performance was compared using a combination of precision and recall scores. Two linear models were chosen for additional testing and algorithm tuning, because of their superior performance and more promising precision and recall scores. The parameters for each model were tuned and optimized using the cross validation. High recall of relevant experiments and moderate precision were obtained implying that these models could be deployed in the GXD curation process to save a significant amount of manual effort. Close analysis of the falsely classified experiments revealed possible directions for model improvement.
Recommended Citation
Boukataya, Yasmine, "Evaluating machine learning approaches for deciding relevance of ArrayExpress experiments for the Gene Expression Database." (2017). Summer and Academic Year Student Reports. 2565.
https://mouseion.jax.org/strp/2565