Faculty Research 2022

Prediction performance of linear models and gradient boosting machine on complex phenotypes in outbred mice.

Bruno C Perez
Marco C A M Bink
Karen L. Svenson, The Jackson LaboratoryFollow
Gary Churchill, The Jackson LaboratoryFollow
Mario P L Calus

Document Type

Article

Publication Date

4-2022

Publication Title

G3 (Bethesda)

Keywords

JMG, Animals, Genomics, Genotype, Linear Models, Mice, Multifactorial Inheritance, Phenotype

JAX Source

G3 (Bethesda) 2022 Apr; 12(4):jkac039

Volume

Issue

ISSN

2160-1836

PMID

35166767

DOI

https://doi.org/10.1093/g3journal/jkac039

Grant

GM070683

Abstract

We compared the performance of linear (GBLUP, BayesB, and elastic net) methods to a nonparametric tree-based ensemble (gradient boosting machine) method for genomic prediction of complex traits in mice. The dataset used contained genotypes for 50,112 SNP markers and phenotypes for 835 animals from 6 generations. Traits analyzed were bone mineral density, body weight at 10, 15, and 20 weeks, fat percentage, circulating cholesterol, glucose, insulin, triglycerides, and urine creatinine. The youngest generation was used as a validation subset, and predictions were based on all older generations. Model performance was evaluated by comparing predictions for animals in the validation subset against their adjusted phenotypes. Linear models outperformed gradient boosting machine for 7 out of 10 traits. For bone mineral density, cholesterol, and glucose, the gradient boosting machine model showed better prediction accuracy and lower relative root mean squared error than the linear models. Interestingly, for these 3 traits, there is evidence of a relevant portion of phenotypic variance being explained by epistatic effects. Using a subset of top markers selected from a gradient boosting machine model helped for some of the traits to improve the accuracy of prediction when these were fitted into linear and gradient boosting machine models. Our results indicate that gradient boosting machine is more strongly affected by data size and decreased connectedness between reference and validation sets than the linear models. Although the linear models outperformed gradient boosting machine for the polygenic traits, our results suggest that gradient boosting machine is a competitive method to predict complex traits with assumed epistatic effects.

Comments

This is an Open Access article distributed under the terms of the Creative Commons Attribution License.

Recommended Citation

Perez B, Bink M, Svenson KL, Churchill G, Calus M. Prediction performance of linear models and gradient boosting machine on complex phenotypes in outbred mice. G3 (Bethesda) 2022 Apr; 12(4):jkac039

Download

Included in

Life Sciences Commons, Medicine and Health Sciences Commons

COinS

Faculty Research 2022

Prediction performance of linear models and gradient boosting machine on complex phenotypes in outbred mice.

Document Type

Publication Date

Publication Title

Keywords

JAX Source

Volume

Issue

ISSN

PMID

DOI

Grant

Abstract

Comments

Recommended Citation

Included in

Search

Browse

Links

Faculty Research 2022

Prediction performance of linear models and gradient boosting machine on complex phenotypes in outbred mice.

Authors

Document Type

Publication Date

Publication Title

Keywords

JAX Source

Volume

Issue

ISSN

PMID

DOI

Grant

Abstract

Comments

Recommended Citation

Included in

Share

Search

Browse

Links