Matched Tumor Samples¶
A relatively simple schema for the analysis of matched tumor/normal samples from cancer studies. The assumed setting is as follows.
- Each bio entity is a patient/donor.
- Each donor gives one normal (bio) sample (e.g., blood or saliva) and at least one (bio) sample from the cancer (e.g., primary tumor or metastesis).
- For each tumor and non-tumor sample, there is at least one DNA HTS library sequenced.
- For each tumor sample, there can be RNA HTS libraries.
- Only the first seen DNA/RNA library is considered for each sample (the “primary one”)
Note
The requirement of one DNA HTS library for each sample and RNA only for tumor can be dropped in the future.
Matched Tumor Fields¶
The following fields must be present for matched tumor sample sheets.
- BioSample
- isTumor – a boolean defining whether the sample was taken from tumor cells
Matched Tumor TSV Schema¶
Additionally, there is an alternative to defining schemas in JSON format for matched tumor sample sheets. Instead, a TSV-based schema can be used.
Optionally, the schema can contain meta data, starting with [Metadata]
INI-style section header (the data section has to start with [Data]
).
[Metadata]
schema cancer_matched
schema_version v1
title Example matched cancer tumor/normal study
description The study has two patients, P001 has one tumor sample, P002 has two
[Data]
The schema
and schema_version
lines are optional.
If the file does not start with an INI-style section header, it starts with tab-separated column names. An example is shown below:
patientName sampleName isTumor libraryType folderName
P001 N1 N WES P001-N1-DNA1-WES1
P001 T1 Y WES P001-T1-DNA1-WES1
P001 T1 Y mRNA-seq P001-T1-RNA1-mRNAseq1
P002 N1 N WES P001-N1-DNA1-WES1
P002 T1 Y WES P001-T1-DNA1-WES1
P002 T1 Y WES P001-T1-RNA1-RNAseq1
P002 T2 Y WES P001-T2-DNA1-WES1
P002 T2 Y mRNA-seq P001-T2-RNA1-mRNAseq1
They are as follows:
patientName
– name of the patient, used for identifying the patient in the sample sheet.sampleName
– name of the sample, used for identifying the sample for the patient in the sample sheet (the combination of patient and sample must be unique in the sheet).isTumor
– a flag identifying a sample as being from tumor, one of {Y
,N
,1
,0
}extractionType
– a valid extraction type as in the JSON schemalibraryType
– a valid libraryType, as in the JSON schemafolderName
– a folder name to search the library’s FASTQ files for. A list of base folders to search for the folder names is given in the configuration, so no full path is given here.
Note that the name of the TestSample
and and NGSLibrary
entities are missing, they will be auto-generated based on the extractionType
and libraryType
.
Optionally, the following fields can be added:
seqPlatform
can be one ofIllumina
andPacBio
, default isIllumina