Document Type


Publication Date




JAX Source

F1000 Res. 2022;11:994



Background: Biomedical research over the past two decades has become data and information rich. This trend has been in large part driven by the development of systems-scale molecular profiling capabilities and by the increasingly large volume of publications contributed by the biomedical research community. It has therefore become important for early career researchers to learn to leverage this wealth of information in their own research.

Methods: Here we describe in detail a training curriculum focusing on the development of foundational skills necessary to retrieve, structure, and aggregate information available from vast stores of publicly available information. It is provided along with supporting material and an illustrative use case. The stepwise workflow encompasses; 1) Selecting a candidate gene; 2) Retrieving background information about the gene; 3) Profiling its literature; 4) Identifying in the literature instances where its transcript abundance changes in blood of patients; 5) Retrieving transcriptional profiling data from public blood transcriptome and reference datasets; and 6) Drafting a manuscript, submitting it for peer-review, and publication.

Results: This resource may be leveraged by instructors who wish to organize hands-on workshops. It can also be used by independent trainees as a self-study toolkit. The workflow presented as proof-of- concept was designed to establish a resource for assessing a candidate gene’s potential utility as a blood transcriptional biomarker. Trainees will learn to retrieve literature and public transcriptional profiling data associated with a specific gene of interest. They will also learn to extract, structure, and aggregate this information to support downstream interpretation efforts as well as the preparation of a manuscript.

Conclusions: This resource should support early career researchers in their efforts to acquire skills that will permit them to leverage the vast amounts of publicly available large-scale profiling data.


This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.