Haplotype-resolved diverse human genomes and integrated analysis of structural variation.

Document Type

Article

Publication Date

4-2-2021

Publication Title

Science

Keywords

JGM, Female, Genetic Variation, Genome, Human, Genotype, Haplotypes, High-Throughput Nucleotide Sequencing, Humans, INDEL Mutation, Interspersed Repetitive Sequences, Male, Population Groups, Quantitative Trait Loci, Retroelements, Sequence Analysis, DNA, Sequence Inversion, Whole Genome Sequencing

JAX Source

Science 2021 Apr 2; 372(6537):eabf7117

Volume

372

Issue

6537

ISSN

1095-9203

PMID

33632895

DOI

https://doi.org/10.1126/science.abf7117

Abstract

Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.

Share

COinS