Please note that this website is currently in its BETA phase, and content or features may change without notice.

GENIE v17 - Generation of count data


Count data are generated to assist with variant interpretation according to the UK Somatic Variant Interpretation Guidelines (S-VIG).

Count details
Count Type Present for Description
Same nucleotide change All variants. Variants are grouped by GRCh38 CHROM, POS, REF and ALT and the number of unique patients are counted.
Same amino acid change All variants with HGVSp notation present. Variants are grouped by Hugo_Symbol, Transcript_ID and HGVSp and the number of unique patients are counted.
Downstream frameshift (truncating) Any variants with a Variant_Classification of Frame_Shift_Del, Frame_Shift_Ins or Nonsense_Mutation which have "Ter" in the HGVSp notation. The first position is extracted from the HGVSc notation and the number of unique patients with downstream frameshift (truncating) variants with the same Hugo_Symbol and Transcript_ID are counted.
Inframe deletions Any variants with a Variant_Classification equal to In_Frame_Del with HGVSp notation present. For each variant, the deletion range is extracted from the HGVSp notation and the number of unique patients with an inframe deletion with the same Hugo_Symbol and Transcript_ID nested within the range are counted.
Counting principles

A variant can be present multiple times for the same patient, which can be for the same or different cancer types. Example scenario where a patient has two tests performed:

  1. Ovarian Cancer
  2. Leukemia

All variants from test 1 for this patient go into the counts for cancer 1 (Ovarian Cancer) and all variants from test 2 go into the counts for cancer 2 (Leukemia). However, counts for all cancers (and grouped counts for haemonc or solid cancers) do not double count any variants for this patient shared between cancer 1 & 2.

Annotations

Annotations, including transcripts, are taken from the GENIE data. 39 genes are annotated against multiple transcripts:

RUNX1, HIST1H3D, SESN1, TAP2, RUNX1T1, ERRFI1, DYNC1I1, TPM1, PKD1L2, RHOT1, RRAS2, FGF14, SGK1, INPP5D, CIC, PTPRS, SDHD, BCL6, MECOM, IGF1, KCNIP1, CLDN18, CUX1, SQSTM1, FGF5, PTPRB, ABL1, NRG1, MAF, ESR1, ETS1, PGBD5, TERT, SHANK2, FGF6, HSP90AA1, APC, PPARG, HDAC9

Normalisation duplicates

Variants in GENIE are provided by separate submitting institutions and no normalisation is performed within GENIE itself. Variants were therefore converted from MAF to VCF description (GRCh37) and normalised (left-aligned and made parsimonious), which identified 3038 variants with a differing representation within GENIE which normalised to the same GRCh37 variant. For these variants, the Consequence, HGVSc, HGVSp and Variant_Classification were regenerated against the same Transcript_ID prior to counting.

Variants removed
  • 732 variants with a Hugo_Symbol of "Unknown" were removed.
  • 3 variants were removed due to a mismatch with the GRCh37 reference when converting from MAF to VCF description.
  • 684 variants failed to liftover from GRCh37 -> GRCh38 and were removed.
  • 2 variants on contig 22_KI270928v1_alt were removed.