Variant Caller Performance on Diverse Strains: Exploring the significance of the reference genome on caller performance
In: Student Reports, Summer 2023, The Jackson Laboratory
Beth Dumont, PhD., Laura Blanco-Berdugo, M.S. and Alexis Garretson, M.S.
Variant calling tools are able, with varying levels of success, to identify variation from a specified reference genome. Reference genomes are assembled for different species, generally using one strain or population, but are currently being assembled with greater specificity and with higher regard for the strains and populations for which they are relevant. In this study, we explore the impact of the reference genome used on variant caller performance. We compare the performance of 4 different variant callers when using two different reference genomes – a strain specific genome versus the standard mm39 reference genome. We find that variant callers perform much better when strain specific references are used, demonstrating the importance of population-specific reference genome assemblies for the best analysis of next-generation sequencing data. Further, we observed that among the callers tested, Freebayes was the most conservative with calls, while Mpileup recovered the most variants. The best caller for a project will therefore vary based on the researchers’ priorities – sensitivity versus recall etc.
Roberts, Aleisha, "Variant Caller Performance on Diverse Strains: Exploring the significance of the reference genome on caller performance" (2023). Summer and Academic Year Student Reports. 2763.