Skip to content

FAIR Image Data


National BioImaging standards and FAIR-IO (Image Objects)

Internationally, countless petabytes of biological imaging data are collected each year. Only a fraction can be shared in public domain archives like EBI’s BioImage Archive (BIA) or the Image Data Resource (IDR). Much of the remaining data is in want of approachable FAIR infrastructure to be shared. Exacerbating this situation, imaging data tends to be particularly opaque, due to a combination of proprietary file formats, unspecified metadata, and monolithic files which must be downloaded in toto. Yet this vast untapped pool of knowledge could foster education, training models, and new scientific discoveries

One task area led by the NFDI4BIOIMAGE seeks to make such data open and web-accessible by combining the “FAIR Data Object” (FDO) concept with previous and ongoing efforts within bioimaging. This effort has been tentatively titled "FAIR Image Objects (FAIR-IO)" in the application, as seen in Figure 1.

FAIROIO.JPG

Fig. 1: A FAIR Image Object combines the necessary acquisition and provenance metadata together with multi-resolution, chunked binary data in a single cloud-compatible format for simplified sharing and re-use. CC-BY: NFDI4BIOIMAGE Consortium. (2021). Zenodo. https://doi.org/10.5281/zenodo.7394675

To become a FAIR data object, imaging data first needs an open and accessible metadata representation. /div>

Based on community workshops by a group of international imaging scientists, these REMBI guidelines for a set of recommended metadata annotations were published in 2021 (Sarkans et al., 2021). A key idea was that metadata should (or must) contain information that is relevant to three main groups of science professionals:

  • Biologists using microscopes for their research
  • Computer vision researchers interested in image processing and analysis
  • Imaging scientists working on the development of imaging techniques

To cover these needs, 35 items of metadata were proposed that should be at least annotated for imaging data. Each item is a „Key“ in combination with a discrete value that specifies the information for this Key („Key-Value Pair“ annotation). For example, one key item is „Imaging Method“. For a user annotating data, the value for this item would be the exact microscopy technique used to record the image, for example, „line-scanning confocal fluorescence microscopy”. The 35 items are grouped into eight categories. Each item is described in the REMBI publication, and examples are provided. T o avoid the items being annotated with arbitrary terms, the authors provided suggestions from which controlled vocabulary or ontology the terms for the annotated values should be taken.

The REMBI metadata standards can be used to annotate individual images. Still, several REMBI items fit best for annotating datasets from an experiment or even a series of experiments within a study. REMBI has a flat structure and is a good beginner‘s orientation on what to think about when annotating microscopy metadata. An example of how REMBI could be used to annotate metadata in an OMERO database was proposed by the R DMbites team of ELIXIR-UK: https://www.youtube.com/watch?v=3J5zqqO9LNs

The UFZ is following the Recommended Metadata for Biological Images (REMBI) schema reported here to annotated and share image metadata. For more details on the schema, see here.

REMBI1.JPG

REMBI SCHEMA: The “study” module describes the top-level metadata elements, in alignment with existing generic standards such as Dublin Core, DataCite Metadata, and schema.org. For example, in a correlative study comprising serial block-face scanning electron microscopy (SBF-SEM) and confocal images,one of the study components would contain all information on the EM image stack, the other study component would correspond to the confocal stack, and a transformation description would allow an overlay of the two types of image. Data that retain spatial fidelity to underlying images (for example, label maps, volume renderings) are described in the “image data” module, whereas “analyzed data” (for example, volumetric analyses, image segment features, counts) contains image-derived measurements, typically presented in tabular form

REMBI2.JPG

Recommeded metadata in bio imaging: Biologists and life scientists who are interested in repeating experiments, (re-)analyzing or comparing bioimage data and understanding results. For this, they need detailed information on the experimental context, such as the composition of biological samples, molecular entities, experimental interventions (for example, control vs. treatment) and how these relate to the image data. Imaging scientists (microscopists and technology developers) who are interested in developing new imaging technologies. For this, they need detailed information on the image-acquisition process, such as physical properties of the image-acquisition set-up, and may benefit from some high-level information on the biological problem at hand. Computer-vision researchers who develop algorithms (not limited to biological applications). Depending on the objective, they may need any of the information listed above. For example, to train a machine-learning algorithm, they would need ‘ground truth’ information such as adequately labeled images with categories (for example, control vs. treatments/phenotypes) or object outlines (segmentations)*

Resources