Title

pyBedGraph: a Python package for fast operations on 1-dimensional genomic signal tracks.

Document Type

Article

Publication Date

2-11-2020

Keywords

JGM

JAX Source

Bioinformatics 2020 Feb 11 [Epub ahead of print]

PMID

32044918

DOI

https://www.ncbi.nlm.nih.gov/pubmed/?term=johanna+i#

Abstract

MOTIVATION: Modern genomic research is driven by next-generation sequencing experiments such as ChIP-seq and ChIA-PET that generate coverage files for transcription factor binding, as well as DHS and ATAC-seq that yield coverage files for chromatin accessibility. Such files are in a bedGraph text format or a bigWig binary format. Obtaining summary statistics in a given region is a fundamental task in analyzing protein binding intensity or chromatin accessibility. However, the existing Python package for operating on coverage files is not optimized for speed.

RESULTS: We developed pyBedGraph, a Python package to quickly obtain summary statistics for a given interval in a bedGraph or a bigWig file. When tested on 12 ChIP-seq, ATAC-seq, RNA-seq, and ChIA-PET datasets, pyBedGraph is on average 260 times faster than the existing program pyBigWig. On average, pyBedGraph can look up the exact mean signal of 1 million regions in ∼0.26 seconds and can compute their approximate means in less than 0.12 seconds on a conventional laptop.

AVAILABILITY: pyBedGraph is publicly available at https://github.com/TheJacksonLaboratory/pyBedGraph under the MIT license.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Share

COinS