Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads.
Document Type
Article
Publication Date
3-2021
Publication Title
Nature biotechnology
Keywords
JGM, Algorithms, Genome, Human, Haplotypes, High-Throughput Nucleotide Sequencing, Humans, Parents, Puerto Rico, Sequence Analysis, DNA, Single-Cell Analysis
JAX Source
Nat Biotechnol 2021 Mar; 39(3):302-308
Volume
39
Issue
3
First Page
302
Last Page
308
ISSN
1546-1696
PMID
33288906
DOI
https://doi.org/10.1038/s41587-020-0719-5
Abstract
Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing1,2 with continuous long-read or high-fidelity3 sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms.
Recommended Citation
Porubsky D,
Ebert P,
Audano P,
Vollger M,
Harvey W,
Marijon P,
Ebler J,
Munson K,
Sorensen M,
Sulovari A,
Haukness M,
Ghareghani M,
Structural Variation Consortium H,
Lansdorp P,
Paten B,
Devine S,
Sanders A,
Lee C,
Chaisson M,
Korbel J,
Eichler E,
Marschall T.
Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads. Nat Biotechnol 2021 Mar; 39(3):302-308