Document Type
Article
Publication Date
2-7-2024
Original Citation
Audano P,
Beck C.
Small polymorphisms are a source of ancestral bias in structural variant breakpoint placement. Genome Res. 2024;34(1):7-19
Keywords
JGM, Humans, Sequence Analysis, Algorithms, Genome, Human, Genomic Structural Variation, Bias, Sequence Analysis, DNA, High-Throughput Nucleotide Sequencing
JAX Source
Genome Res. 2024;34(1):7-19
ISSN
1549-5469
PMID
38176712
DOI
https://doi.org/10.1101/gr.278203.123
Grant
P.A.A. and C.R.B. were supported by National Institutes of Health (NIH) National Institute of General Medical Sciences R35GM133600 and NIH National Cancer Institute P30CA034196. The Human Genome Structural Variation Consortium (HGSVC) provided published data, support, and feed- back, and the HGSVC was supported by NIH National Human Genome Research Institute U24HG007497.
Abstract
High-quality genome assemblies and sophisticated algorithms have increased sensitivity for a wide range of variant types, and breakpoint accuracy for structural variants (SVs, ≥50 bp) has improved to near base pair precision. Despite these advances, many SV breakpoint locations are subject to systematic bias affecting variant representation. To understand why SV breakpoints are inconsistent across samples, we reanalyzed 64 phased haplotypes constructed from long-read assemblies released by the Human Genome Structural Variation Consortium (HGSVC). We identify 882 SV insertions and 180 SV deletions with variable breakpoints not anchored in tandem repeats (TRs) or segmental duplications (SDs). SVs called from aligned sequencing reads increase breakpoint disagreements by 2×-16×. Sequence accuracy had a minimal impact on breakpoints, but we observe a strong effect of ancestry. We confirm that SNP and indel polymorphisms are enriched at shifted breakpoints and are also absent from variant callsets. Breakpoint homology increases the likelihood of imprecise SV calls and the distance they are shifted, and tandem duplications are the most heavily affected SVs. Because graph genome methods normalize SV calls across samples, we investigated graphs generated by two different methods and find the resulting breakpoints are subject to other technical biases affecting breakpoint accuracy. The breakpoint inconsistencies we characterize affect ∼5% of the SVs called in a human genome and can impact variant interpretation and annotation. These limitations underscore a need for algorithm development to improve SV databases, mitigate the impact of ancestry on breakpoints, and increase the value of callsets for investigating breakpoint features.
Comments
This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 Internation- al), as described at http://creativecommons.org/licenses/by-nc/4.0/.