Soft Windowing Application to Improve Analysis of High-throughput Phenotyping Data.

Hamed Haselimashhadi
Jeremy C Mason
Violeta Munoz-Fuentes
Federico López-Gómez
Kolawole Babalola
Elif F Acar
Vivek Kumar, The Jackson Laboratory
Jacqueline K White, The Jackson Laboratory
Ann M Flenniken
Ruairidh King
Ewan Straiton
John Richard Seavitt
Angelina Gaspero
Arturo Garza
Audrey E Christianson
Chih-Wei Hsu
Corey L Reynolds
Denise G Lanza
Isabel Lorenzo
Jennie R Green
Juan J Gallegos
Ritu Bohat
Rodney C Samaco
Surabi Veeraragavan
Jong Kyoung Kim
Gregor Miller
Helmult Fuchs
Lillian Garrett
Lore Becker
Yeon Kyung Kang
David Clary
Soo Young Cho
Masaru Tamura
Nobuhiko Tanaka
Kyung Dong Soo
Alexandr Bezginov
Ghina Bou About
Marie-France Champy
Laurent Vasseur
Sophie Leblanc
Hamid Meziane
Mohammed Selloum
Patrick T Reilly
Nadine Spielmann
Holger Maier
Valerie Gailus-Durner
Tania Sorg
Masuya Hiroshi
Obata Yuichi
Jason D Heaney
Mary E Dickinson
Wurst Wolfgang
Glauco P Tocchini-Valentini
Kevin C Kent Lloyd
Colin McKerlie
Je Kyung Seong
Herault Yann
Martin Hrabé de Angelis
Steve D M Brown
Damian Smedley
Paul Flicek
Ann-Marie Mallon
Helen Parkinson
Terrence F Meehan

Abstract

MOTIVATION: High-throughput phenomic projects generate complex data from small treatment and large control groups that increase the power of the analyses but introduce variation over time. A method is needed to utlize a set of temporally local controls that maximises analytic power while minimising noise from unspecified environmental factors.

RESULTS: Here we introduce "soft windowing", a methodological approach that selects a window of time that includes the most appropriate controls for analysis. Using phenotype data from the International Mouse Phenotyping Consortium (IMPC), adaptive windows were applied such that control data collected proximally to mutants were assigned the maximal weight, while data collected earlier or later had less weight. We applied this method to IMPC data and compared the results with those obtained from a standard non-windowed approach. Validation was performed using a resampling approach in which we demonstrate a 10% reduction of false positives from 2.5 million analyses. We applied the method to our production analysis pipeline that establishes genotype-phenotype associations by comparing mutant versus control data. We report an increase of 30% in significant p-values, as well as linkage to 106 versus 99 disease models via phenotype overlap with the soft-windowed and non-windowed approaches, respectively, from a set of 2,082 mutant mouse lines. Our method is generalisable and can benefit large-scale human phenomic projects such as the UK Biobank and the All of Us resources.

AVAILABILITY AND IMPLEMENTATION: The method is freely available in the R package SmoothWin, available on CRAN http://CRAN.R-project.org/package=SmoothWin.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.