Document Type

Article

Publication Date

2-7-2024

Keywords

JGM, Humans, Sequence Analysis, Algorithms, Genome, Human, Genomic Structural Variation, Bias, Sequence Analysis, DNA, High-Throughput Nucleotide Sequencing

JAX Source

Genome Res. 2024;34(1):7-19

ISSN

1549-5469

PMID

38176712

DOI

https://doi.org/10.1101/gr.278203.123

Grant

P.A.A. and C.R.B. were supported by National Institutes of Health (NIH) National Institute of General Medical Sciences R35GM133600 and NIH National Cancer Institute P30CA034196. The Human Genome Structural Variation Consortium (HGSVC) provided published data, support, and feed- back, and the HGSVC was supported by NIH National Human Genome Research Institute U24HG007497.

Abstract

High-quality genome assemblies and sophisticated algorithms have increased sensitivity for a wide range of variant types, and breakpoint accuracy for structural variants (SVs, ≥50 bp) has improved to near base pair precision. Despite these advances, many SV breakpoint locations are subject to systematic bias affecting variant representation. To understand why SV breakpoints are inconsistent across samples, we reanalyzed 64 phased haplotypes constructed from long-read assemblies released by the Human Genome Structural Variation Consortium (HGSVC). We identify 882 SV insertions and 180 SV deletions with variable breakpoints not anchored in tandem repeats (TRs) or segmental duplications (SDs). SVs called from aligned sequencing reads increase breakpoint disagreements by 2×-16×. Sequence accuracy had a minimal impact on breakpoints, but we observe a strong effect of ancestry. We confirm that SNP and indel polymorphisms are enriched at shifted breakpoints and are also absent from variant callsets. Breakpoint homology increases the likelihood of imprecise SV calls and the distance they are shifted, and tandem duplications are the most heavily affected SVs. Because graph genome methods normalize SV calls across samples, we investigated graphs generated by two different methods and find the resulting breakpoints are subject to other technical biases affecting breakpoint accuracy. The breakpoint inconsistencies we characterize affect ∼5% of the SVs called in a human genome and can impact variant interpretation and annotation. These limitations underscore a need for algorithm development to improve SV databases, mitigate the impact of ancestry on breakpoints, and increase the value of callsets for investigating breakpoint features.

Comments

This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 Internation- al), as described at http://creativecommons.org/licenses/by-nc/4.0/.

Share

COinS