Data Annotation Module for Frictionless Data

Authors

Naomie Gao

Document Type

Article

Publication Date

Summer 2021

JAX Location

In: Student Reports, Summer 2021, The Jackson Laboratory

Abstract

Background: The heterogeneity in the descriptions of data and the procedures through which they were produced and altered often lead to difficulty in the understanding and reusability of such data for scientists. Creating a “Frictionless” module for uniform data annotation entails an end-to-end workflow spanning from data planning, through acquisition, QC, and subsequent updates. It will also package the data with complete descriptions and metadata, all of which will allow the data to adhere to FAIR (Findable, Accessible, Interoperable, and Reusable) principles, which will lead to greater reusability of research data and potential of machines acting upon the data. Methods: The Python Frictionless data package was at the center of the project, with various technologies used and created to host the capabilities provided by Frictionless. These features of Frictionless including adding descriptions of variables within a dataset, their constraints, the url of its relevant ontology, and describing the file as a whole. Results: A new suite of interfaces for providing data annotations was created, with a web-based form for accepting and viewing inputs as well as a command line interface (CLI) that allows users to access and view data annotations from local and server-based .cube files through the terminal. We refer to this collection of tools as the Data Annotation module. This work is being integrated into the JAX BioConnect ecosystem as a part of the Cube Initiative. Conclusion: The data annotation module supports the description of any tabular file in a consistent format, leading to ease of understanding data contents and thus allowing data to be more readily found, reused, and computed upon, as well as the identification of connections between different data sets. This also has potential for the application of artificial intelligence and machine learning, which rely on such uniform attributes.

Please contact the Joan Staats Library for information regarding this document.

Share

COinS