Haplotype-resolved diverse human genomes and integrated analysis of structural variation.

Peter Ebert
Peter A Audano
Qihui Zhu, The Jackson Laboratory
Bernardo Rodriguez-Martin
David Porubsky
Marc Jan Bonder
Arvis Sulovari
Jana Ebler
Weichen Zhou
Rebecca Serra Mari
Feyza Yilmaz, The Jackson Laboratory
Xuefang Zhao
PingHsun Hsieh
Joyce Lee
Sushant Kumar
Jiadong Lin
Tobias Rausch
Yu Chen
Jingwen Ren
Martin Santamarina
Wolfram Höps
Hufsah Ashraf
Nelson T Chuang
Xiaofei Yang
Katherine M Munson
Alexandra P Lewis
Susan Fairley
Luke J Tallon
Wayne E Clarke
Anna O Basile
Marta Byrska-Bishop
André Corvelo
Uday S Evani
Tsung-Yu Lu
Mark J P Chaisson
Junjie Chen
Chong Li
Harrison Brand
Aaron M Wenger
Maryam Ghareghani
William T Harvey
Benjamin Raeder
Patrick Hasenfeld
Allison A Regier
Haley J Abel
Ira M Hall
Paul Flicek
Oliver Stegle
Mark B Gerstein
Jose M C Tubio
Zepeng Mu
Yang I Li
Xinghua Shi
Alex R Hastie
Kai Ye
Zechen Chong
Ashley D Sanders
Michael C Zody
Michael E Talkowski
Ryan E Mills
Scott E Devine
Charles Lee, The Jackson Laboratory
Jan O Korbel
Tobias Marschall
Evan E Eichler

Abstract

Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average contig N50: 26 Mbp) integrate all forms of genetic variation even across complex loci. We identify 107,590 structural variants (SVs), of which 68% are not discovered by short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterize 130 of the most active mobile element source elements and find that 63% of all SVs arise by homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1,526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.