A training curriculum for retrieving, structuring, and aggregating information derived from the biomedical literature and large-scale data repositories F1000 Res. 2022;11:994
F1000 Res. 2022;11:994
Background: Biomedical research over the past two decades has become data and information rich. This trend has been in large part driven by the development of systems-scale molecular profiling capabilities and by the increasingly large volume of publications contributed by the biomedical research community. It has therefore become important for early career researchers to learn to leverage this wealth of information in their own research.
Methods: Here we describe in detail a training curriculum focusing on the development of foundational skills necessary to retrieve, structure, and aggregate information available from vast stores of publicly available information. It is provided along with supporting material and an illustrative use case. The stepwise workflow encompasses; 1) Selecting a candidate gene; 2) Retrieving background information about the gene; 3) Profiling its literature; 4) Identifying in the literature instances where its transcript abundance changes in blood of patients; 5) Retrieving transcriptional profiling data from public blood transcriptome and reference datasets; and 6) Drafting a manuscript, submitting it for peer-review, and publication.
Results: This resource may be leveraged by instructors who wish to organize hands-on workshops. It can also be used by independent trainees as a self-study toolkit. The workflow presented as proof-of- concept was designed to establish a resource for assessing a candidate gene’s potential utility as a blood transcriptional biomarker. Trainees will learn to retrieve literature and public transcriptional profiling data associated with a specific gene of interest. They will also learn to extract, structure, and aggregate this information to support downstream interpretation efforts as well as the preparation of a manuscript.
Conclusions: This resource should support early career researchers in their efforts to acquire skills that will permit them to leverage the vast amounts of publicly available large-scale profiling data.