Table Fields¶
Core Table Fields¶
This section describes the common table fields. Generally, the pk field is an integer primary key that is to be automaticaly generated (i.e. autoincrement in RDBMS). The field secondary_id is an identifier assigned by the “data owner” (e.g., the collaboration partner). This identifier has to be unique within a given project but can be ambiguous globally.
A possible best practice is to enforce the secondary_id to only consist of alphanumeric characters and underscores.
Then, they should be constructed as (none of the <Field>
values should contain a hyphen itself):
<BioEntity>-<BioSample>-<TestSample>-<NGSLibrary>
(of course only up to “BioSample” for BioSamples etc.).
Examples are:
- BioEntity secondary ids: 2355, BIH-234
- BioSample secondary ids:
- 2355-B1 (first blood sample from patient 2355)
- BIH_234-N1 (first normal sample from patient BIH-234)
- BIH_234-T2 (second tumor sample from patient BIH-234)
- TestSample secondary ids:
- 2355-B1-DNA1 (first DNA extraction from first blood sample)
- BIH_234-T1-RNA1 (first RNA extraction from first tumor sample)
- BIH_234-T2-DNA2 (second DNA extraction from second tumor sample)
Generally, the following are “core fields”.
BioEntity¶
- pk: integer
- secondary_id: string
BioSample¶
- pk: integer
- bio_entity: fk to BioEntity.pk
- secondary_id: string
TestSample¶
- pk: integer
- bio_sample: fk to BioSample.pk
- secondary_id: string
NGSLibrary¶
- pk: integer
- test_sample: fk to TestSample.pk
- secondary_id: string
FlowCell¶
- pk: integer
- machine_name: string
- flowcell_name: string
NGSLibraryOnFlowCell¶
- pk: integer
- ngs_library: fk to NGSLibrary.pk
- flowcell: fk to FlowCell
- lane: int
Common Table Fields¶
For many major use cases, the following table fields are useful additions to get a list of “common fields”.
For all tables, adding a list of strings with external IDs (e.g., called “external_ids”) is recommendable. This way, external resources can be linked out to. A recommendation is to use URLs for giving reads an unambiguous prefix. These URLs can be pseudo URLs or real entry points in remote REST APIs. Further, each record has a meta_data field for structured data in JSON format.
BioEntity¶
- affected: boolean, optional field for specifying the “affected” state in rare disease studies
- sex: {‘male’, ‘female’, ‘unknown’}, optional field for person’s sex in germline studies
- father: fk to BioEntity.pk, optional fields for linking to father
- mother: fk to BioEntity.pk, optional fields for linking to mother
BioSample¶
- cell_type: string with controlled vocabulary, optional field for specifying cell type
TestSample¶
- extraction_type: controlled vocabulary with extraction type, e.g. {‘DNA’, ‘RNA’} or a superset thereof; optional field for describing extracted data
NGSLibrary¶
- library_kind: controlled vocabulary with library preparation type, e.g., {‘WES’, ‘WGS’, ‘RNA-seq’, ‘other’} or a superset thereof; required field for describing library type
- kit: controlled vocabulary describing kit and version used for targeted sequencing, or RNA amplifcation method
NGSLIbraryOnFC¶
- adapter_name: string, optional field describing name of used adapter barcode(s)
- adapter_seq: string, optional field giving sequence of used adapter barcode(s)