Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads.
JGM, Algorithms, Genome, Human, Haplotypes, High-Throughput Nucleotide Sequencing, Humans, Parents, Puerto Rico, Sequence Analysis, DNA, Single-Cell Analysis
Nat Biotechnol 2021 Mar; 39(3):302-308
Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing1,2 with continuous long-read or high-fidelity3 sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms.
Porubsky, David; Ebert, Peter; Audano, Peter A; Vollger, Mitchell R; Harvey, William T; Marijon, Pierre; Ebler, Jana; Munson, Katherine M; Sorensen, Melanie; Sulovari, Arvis; Haukness, Marina; Ghareghani, Maryam; Structural Variation Consortium, Human Genome; Lansdorp, Peter M; Paten, Benedict; Devine, Scott E; Sanders, Ashley D; Lee, Charles; Chaisson, Mark J P; Korbel, Jan O; Eichler, Evan E; and Marschall, Tobias, "Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads." (2021). Faculty Research 2021. 300.