Comparison of eQTL Identification when using Single-end or Paired- end RNA-sequencing Data

Authors

Sam Ardery

Document Type

Article

Publication Date

Summer 2021

JAX Location

In: Student Reports, Summer 2021, The Jackson Laboratory

Abstract

RNA-sequencing of short 75-100 base pair reads is widely used by researchers to determine the expression levels of genes genome-wide in biological samples. Short read RNA-seq can be performed using paired-end (PE) or single-end (SE) sequencing, the former providing sequence from both ends of RNA fragments and therefore assumed to provide more accurate read alignments and estimates of gene expression. However, this assumption has not yet been fully tested in genetically diverse samples, and it is unclear how PE and SE RNA-seq agree or differ in their ability to detect expression quantitative trait loci (eQTL). To characterize these differences, we analyzed PE and SE RNA-seq data from 185 genetically diverse mouse embryonic stem cell lines mice to compare gene expression estimates and eQTL detections. Results show that genes uniquely identified by SE analysis were overrepresented for pseudogenes whereas genes uniquely identified by PE analysis were overrepresented for protein coding genes. Similar results were found in both expression and eQTL analysis. These results suggest that PE sequencing provides a more accurate alignment profile than SE sequencing, especially in diverse populations. Moreover, by limiting spurious read alignments to pseudogenes and correctly assigning more reads to more protein coding genes, PE sequencing results in fewer false positive eQTLs for pseudogenes and fewer false negative eQTLs for protein coding genes, and provides a more accurate understanding of gene regulatory variation compared to SE RNA-seq. Thus, we recommend that researchers use PE RNA-seq in eQTL mapping studies.

Please contact the Joan Staats Library for information regarding this document.

COinS