GENIE v17 - Generation of count data
Count data are generated to assist with variant interpretation according to the UK Somatic Variant Interpretation Guidelines (S-VIG).
Count details
| Count Type | Present for | Description |
|---|---|---|
| Same nucleotide change | All variants. | Variants are grouped by GRCh38 CHROM, POS, REF and ALT and the number of unique patients are counted. |
| Same amino acid change | All variants with HGVSp notation present. |
Variants are grouped by Hugo_Symbol, Transcript_ID and HGVSp and
the number of unique patients are counted. |
| Downstream frameshift (truncating) | Any variants with a Variant_Classification of Frame_Shift_Del,
Frame_Shift_Ins or Nonsense_Mutation which have "Ter" in the
HGVSp notation. |
The first position is extracted from the HGVSc notation and the number of unique patients
with downstream frameshift (truncating) variants with the same Hugo_Symbol and
Transcript_ID are counted. |
| Inframe deletions | Any variants with a Variant_Classification equal to In_Frame_Del with
HGVSp notation present. |
For each variant, the deletion range is extracted from the HGVSp notation and the number of
unique patients with an inframe deletion with the same Hugo_Symbol and
Transcript_ID nested within the range are counted. |
Counting principles
A variant can be present multiple times for the same patient, which can be for the same or different cancer types. Example scenario where a patient has two tests performed:
- Ovarian Cancer
- Leukemia
All variants from test 1 for this patient go into the counts for cancer 1 (Ovarian Cancer) and all variants from test 2 go into the counts for cancer 2 (Leukemia). However, counts for all cancers (and grouped counts for haemonc or solid cancers) do not double count any variants for this patient shared between cancer 1 & 2.
Annotations
Annotations, including transcripts, are taken from the GENIE data. 39 genes are annotated against multiple transcripts:
RUNX1, HIST1H3D, SESN1, TAP2, RUNX1T1, ERRFI1, DYNC1I1, TPM1, PKD1L2, RHOT1, RRAS2, FGF14, SGK1, INPP5D, CIC, PTPRS, SDHD, BCL6, MECOM, IGF1, KCNIP1, CLDN18, CUX1, SQSTM1, FGF5, PTPRB, ABL1, NRG1, MAF, ESR1, ETS1, PGBD5, TERT, SHANK2, FGF6, HSP90AA1, APC, PPARG, HDAC9
Normalisation duplicates
Variants in GENIE are provided by separate submitting institutions and no normalisation is performed within GENIE
itself. Variants were therefore converted from MAF to VCF description (GRCh37) and normalised (left-aligned and made
parsimonious), which identified 3038 variants with a differing representation within GENIE which normalised to the
same GRCh37 variant. For these variants, the Consequence, HGVSc, HGVSp and
Variant_Classification were regenerated against the same Transcript_ID prior to counting.
Variants removed
- 732 variants with a
Hugo_Symbolof "Unknown" were removed. - 3 variants were removed due to a mismatch with the GRCh37 reference when converting from MAF to VCF description.
- 684 variants failed to liftover from GRCh37 -> GRCh38 and were removed.
- 2 variants on contig 22_KI270928v1_alt were removed.