Introduction

A commonly recurring theme in the analysis of biomedical data is requiring the description of this data in machine-readable form. Two important areas where such a requirement arises are:

  1. storing meta data for the results of an experiment/assay and
  2. processing biomedical data sets in an automated and reproducible ways using some workflow engine.

This document describes a proposal for structuring the data, i.e., a data schema that allows for representing most important use cases that occurred for the Core Unit Bioinformatics (CUBI) at the Berlin Institute of Health (BIH). Further, this document proposes certain shortcuts/simplifications that make this data schema more easier to navigate and use for a handful of important use cases.

Originally, this is an adaption of the data schema used by Sven Nahnsen’s group in Tubingen.