Sixteen diverse laboratory mouse reference genomes define strain-specific haplotypes and novel functional loci.

Jingtao Lilue
Anthony G Doran
Ian T Fiddes
Monica Abrudan
Joel Armstrong
Ruth Bennett
William Chow
Joanna Collins
Stephan Collins
Anne M Czechanski, The Jackson Laboratory
Petr Danecek
Mark Diekhans
Dirk-Dominik Dolle
Matt Dunn
Richard Durbin
Dent Earl
Anne Ferguson-Smith
Paul Flicek
Jonathan Flint
Adam Frankish
Beiyuan Fu
Mark Gerstein
James Gilbert
Leo Goodstadt
Jennifer Harrow
Kerstin Howe
Ximena Ibarra-Soria
Mikhail Kolmogorov
Chris J Lelliott
Darren W Logan
Jane Loveland
Clayton E Mathews
Richard Mott
Paul Muir
Stefanie Nachtweide
Fabio C P Navarro
Duncan T Odom
Naomi Park
Sarah Pelan
Son K Pham
Mike Quail
Laura Reinholdt
Lars Romoth
Lesley Shirley
Cristina Sisu
Marcela Sjoberg-Herrera
Mario Stanke
Charles Steward
Mark Thomas
Glen Threadgold
David Thybert
James Torrance
Kim Wong
Jonathan Wood
Binnaz Yalcin
Fengtang Yang
David J Adams
Benedict Paten
Thomas M Keane

Abstract

We report full-length draft de novo genome assemblies for 16 widely used inbred mouse strains and find extensive strain-specific haplotype variation. We identify and characterize 2,567 regions on the current mouse reference genome exhibiting the greatest sequence diversity. These regions are enriched for genes involved in pathogen defence and immunity and exhibit enrichment of transposable elements and signatures of recent retrotransposition events. Combinations of alleles and genes unique to an individual strain are commonly observed at these loci, reflecting distinct strain phenotypes. We used these genomes to improve the mouse reference genome, resulting in the completion of 10 new gene structures. Also, 62 new coding loci were added to the reference genome annotation. These genomes identified a large, previously unannotated, gene (Efcab3-like) encoding 5,874 amino acids. Mutant Efcab3-like mice display anomalies in multiple brain regions, suggesting a possible role for this gene in the regulation of brain development.