Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation.

Shashikant Pujar
Nuala A O'Leary
Catherine M Farrell
Jane E Loveland
Jonathan M Mudge
Craig Wallin
Carlos G Girón
Mark Diekhans
If Barnes
Ruth Bennett
Andrew E Berry
Eric Cox
Claire Davidson
Tamara Goldfarb
Jose M Gonzalez
Toby Hunt
John Jackson
Vinita Joardar
Mike P Kay
Vamsi K Kodali
Fergal J Martin
Monica McAndrews, The Jackson LaboratoryFollow
Kelly M McGarvey
Michael Murphy
Bhanu Rajput
Sanjida H Rangwala
Lillian D Riddick
Ruth L Seal
Marie-Marthe Suner
David Webb
Sophia Zhu, The Jackson LaboratoryFollow
Bronwen L Aken
Elspeth A Bruford
Carol J Bult, The Jackson LaboratoryFollow
Adam Frankish
Terence Murphy
Kim D Pruitt

Document Type

Article

Publication Date

1-4-2018

JAX Source

Nucleic Acids Res 2018 Jan 4; 46(D1):D221-D228

Volume

46

Issue

D1

First Page

221

Last Page

221

ISSN

1362-4962

PMID

29126148

DOI

https://doi.org/10.1093/nar/gkx1031

Grant

HG00030

Abstract

The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community. Nucleic Acids Res 2018 Jan 4; 46(D1):D221-D228.

Comments

This article is available under the Creative Commons CC-BY-NC license

Recommended Citation

Pujar S, O'Leary N, Farrell C, Loveland J, Mudge J, Wallin C, Girón C, Diekhans M, Barnes I, Bennett R, Berry A, Cox E, Davidson C, Goldfarb T, Gonzalez J, Hunt T, Jackson J, Joardar V, Kay M, Kodali V, Martin F, McAndrews M, McGarvey K, Murphy M, Rajput B, Rangwala S, Riddick L, Seal R, Suner M, Webb D, Zhu S, Aken B, Bruford E, Bult C, Frankish A, Murphy T, Pruitt K. Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation. Nucleic Acids Res 2018 Jan 4; 46(D1):D221-D228

Faculty Research 2018

Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation.

Document Type

Publication Date

JAX Source

Volume

Issue

First Page

Last Page

ISSN

PMID

DOI

Grant

Abstract

Comments

Recommended Citation

Included in

Search

Browse

Links

Faculty Research 2018

Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation.

Authors

Document Type

Publication Date

JAX Source

Volume

Issue

First Page

Last Page

ISSN

PMID

DOI

Grant

Abstract

Comments

Recommended Citation

Included in

Share

Search

Browse

Links