Profiling variable-number tandem repeat variation across populations using repeat-pangenome graphs.
Chromosome Mapping, Gene Expression Regulation, Genetic Loci, Genetic Variation, Genetics, Population, Genome, Human, Humans, Minisatellite Repeats, Nucleotide Motifs, Quantitative Trait Loci
Nat Commun 2021 Jul 12;12(1):4250
Variable number tandem repeats (VNTRs) are composed of consecutive repetitive DNA with hypervariable repeat count and composition. They include protein coding sequences and associations with clinical disorders. It has been difficult to incorporate VNTR analysis in disease studies that use short-read sequencing because the traditional approach of mapping to the human reference is less effective for repetitive and divergent sequences. In this work, we solve VNTR mapping for short reads with a repeat-pangenome graph (RPGG), a data structure that encodes both the population diversity and repeat structure of VNTR loci from multiple haplotype-resolved assemblies. We develop software to build a RPGG, and use the RPGG to estimate VNTR composition with short reads. We use this to discover VNTRs with length stratified by continental population, and expression quantitative trait loci, indicating that RPGG analysis of VNTRs will be critical for future studies of diversity and disease.
Variation Consortium HS,
Profiling variable-number tandem repeat variation across populations using repeat-pangenome graphs. Nat Commun 2021 Jul 12;12(1):4250
Dr. Lee and Dr. Zhu are members of the consortium.
This article is licensed under a Creative Commons
Attribution 4.0 International License