Assignment 2: Gene Expression Analysis & Interpretation

Author

Conor Heffron - 23211267

Introduction

In this report, I will analyse a publicly available dataset based on clinical breast cancer data. Breast cancer is the most diagnosed cancer in women. There are several subtypes of diseases characterized by different genetic drivers for cancer risk and tumour growth. The human epidermal growth factor receptor 2 amplified (HER2: ERBB2 / ERBB2IP) breast cancer is one of the most aggressive subtypes. In addition, I will investigate HER3 (ERBB3), HER4 (ERBB4), PIK3C2B, MDM4, LRRN2, NFASC, KLHDC8A, and CDK18 gene mutations. Although there are targeted therapies that have been developed to treat these cancer cases, the response rate ranges from 40% - 50%. I will download, decompress, clean and process the TCGA RNASeq data for breast cancer from cbioportal and identify the differentially expressed genes between ERBB2 / ERBB2IP, ERBB3, ERBB4, PIK3C2B, MDM4, LRRN2, NFASC, KLHDC8A, and CDK18 cancer tumours.

Note

The dataset can be downloaded from this link:
- https://www.cbioportal.org/study/summary?id=brca_tcga_pan_can_atlas_2018.

Methods Overview

The methods to import data are from the rio package. To manipulate, analyse and query the data the tidyverse package includes several libraries. In particular, I have heavily used the dplyr package and methods such as filter to generate summary tables after data analysis and enrichment processes which are described and commented in the code chunks in an incremental fashion. I have implemented and imported a utility script written in R to assist in the loading, analysis, and aggregation of the TCGA data. The analysis was completed in a step by step fashion to help with my biological interpretation of the results of this analysis. This helped with the selection of features and values for deeper analysis and investigation of smaller subsets of samples.

Biological Interpretation

The BRCA1 gene mutation is heavily associated with breast cancer. People who carry this gene mutation, have a hightened risk of developing cancer over time. Carriers of the BRCA1 gene often develop triple-negative, basal-like, aggressive breast tumours. Hormone signalling is pertinent in the inception of BRCA1 mutant breast cancers. Progesterone (PR) levels are clearly higher in BRCA1 mutation carriers and they have a higher risk of developing breast cancer with a low survival rate.
HER2 is a member of the human Epidermal Growth Factor Receptor (EGFR) family, which actuates the signalling pathways that promote cell proliferation & survival by dimerization with other EGFR family members. HER2 breast cancers are likely to benefit from chemotherapy and treatment targeted to HER2.
EGFR is a protein located on cells that help them to grow. A mutation in the EFGR gene can compel excessive growth which can cause cancer.
There are different breast cancer groups taken into account during the TCGA data analysis segments of this report. The main groups include Luminal tumours (A & B). Luminal A are tumours that are Oestrogen+ (ER+) & PR+ & HER2-. Luminal A breast cancers benefit from hormone therapy & may also benefit from chemotherapy. Luminal B breast cancerts can be HER- or HER+ & ER+. HER2 breast cancers are PR+.
HER3 is becoming a prominent biomarker for breast cancers (HER3 mRNA is expressed as Luminal tumours or ER+) as it is essential for cell survival in Luminal A and Luminal B but not basal normal mammary epithelium (basal like or triple negative breast cancers). Triple negative is the most aggresive form of breast cancer as they can groq and spread more quickly. The most difficult to treat compared to other invasive types of breast cancer because the cancer cells do not have the Oestrogen or Progesterone receptors or enough of the HER2 protein to make hormone therapy or targeted HER2 drugs work.
HER4 expression in Oestrogen receptor-positive breast cancer is associated with decreased sensitivity to tamoxifen treatment and reduced overall survival of post-menopausal women.

Incremental Analysis, Code & Results

The following graphics and summaries have the corresponding code chunks that shows how my analysis of the TCGA data evolved as I noticed patterns related to ER+, HER2, and upgraded/downgraded gene mutations.

Load packages, functions / methods and scripts

Code

library(knitr)
library(readr)
library(rio)
library(tools)
library(conflicted)  
library(dplyr)
library(tibble)
suppressMessages(suppressWarnings(library(DESeq2)))
library(ggplot2)

# resolve conflicts
suppressMessages(suppressWarnings(conflict_prefer("filter", "dplyr")))
suppressMessages(suppressWarnings(conflict_prefer("lag", "dplyr")))
suppressMessages(suppressWarnings(conflict_prefer("count", "dplyr")))
suppressMessages(suppressWarnings(conflict_prefer("select", "dplyr")))
suppressMessages(suppressWarnings(conflicts_prefer(GenomicRanges::setdiff)))

suppressMessages(suppressWarnings(source("assignment-2-utils.R")))

Note

Download the dataset and save to working directory (WD), see link to zip / tarball at https://www.cbioportal.org/study/summary?id=brca_tcga_pan_can_atlas_2018.

Code

path_wd <- "/Users/conorheffron/Desktop/assignment-2/"
setwd(path_wd)

Untar the folder and extract the files

Code

dir_name <- "brca_tcga_pan_can_atlas_2018"
extension <- ".tar.gz"
untar(paste(dir_name, extension, sep=""), files = NULL, list = FALSE, exdir = ".",
      extras = NULL, verbose = FALSE,
      restore_times =  TRUE,
      support_old_tars = Sys.getenv("R_SUPPORT_OLD_TARS", FALSE),
      tar = Sys.getenv("TAR"))

Important

Read the RNA Sequence data file: data_mrna_seq_v2_rsem.txt

Code

data_mrna <- import_data(dir_name, "^data_mrna_seq_v2_rsem.txt", 0)

[1] "data_mrna_seq_v2_rsem.txt - importing data"

Important

Read the Patient Data file: data_clinical_patient.txt

Code

data_clinical <- import_data(dir_name, "^data_clinical_patient", 4)

[1] "data_clinical_patient.txt - importing data"

Important

Read the Copy Number Aberrations (CNA) Data: data_cna.txt

Code

data_cna <- import_data(dir_name, "^data_cna", 0)

[1] "data_cna_hg19.seg is not needed for import..."
[1] "data_cna.txt - importing data"

Important

Read the Samples Data: data_clinical_sample.txt

Code

data_clinical_sample <- import_data(dir_name, "^data_clinical_sample", 4)

[1] "data_clinical_sample.txt - importing data"

Important

Create metadata using the Seq IDs of ERBB2+.

Code

keep <- !duplicated(data_mrna$data_mrna_seq_v2_rsem[, 1])
temp_df_mrna <- data_mrna$data_mrna_seq_v2_rsem[keep,]
temp_df_mrna <- rownames_to_column(as.data.frame(t(data_mrna$data_mrna_seq_v2_rsem |> filter(grepl("ERBB", Hugo_Symbol) | grepl("FAM72C", Hugo_Symbol) | grepl("SRGAP2D", Hugo_Symbol) | grepl("MDM4", Hugo_Symbol) | grepl("PIK3C2B", Hugo_Symbol) | grepl("LRRN2", Hugo_Symbol) | grepl("NFASC", Hugo_Symbol) | grepl("KLHDC8A", Hugo_Symbol) | grepl("LEMD1-AS1", Hugo_Symbol) | grepl("CDK18", Hugo_Symbol) | grepl("PLEKHA6", Hugo_Symbol)))), "row_names")

colnames(temp_df_mrna) <- temp_df_mrna[1,]
df_mrna_seq <- temp_df_mrna[-c(1, 2),]
df_mrna_seq <- df_mrna_seq |> dplyr::rename(PATIENT_ID_REF = Hugo_Symbol)
df_mrna_seq <- df_mrna_seq |> relocate(PATIENT_ID_REF)
df_mrna_seq[, 2:5] <- sapply(df_mrna_seq[, 2:5], as.numeric)
rownames(df_mrna_seq) <- NULL
df_mrna_seq <- df_mrna_seq %>% rename_with(~ paste(., "SEQ", sep = "_"))
df_mrna_seq$PATIENT_ID <- substr(df_mrna_seq$PATIENT_ID_REF_SEQ, 1, nchar(df_mrna_seq$PATIENT_ID_REF_SEQ) - 3)
df_mrna_seq <- df_mrna_seq |> relocate(PATIENT_ID)

Important

Create metadata using the CNA level IDs of ERBB2+ features etc.

Code

temp_cna_df <- data_cna$data_cna
df_cna_ids <- rownames_to_column(temp_cna_df, "row_names")
df_cna_ids <- setNames(data.frame(t(temp_cna_df[,-1])), temp_cna_df[,1])

erbb2_cols <- df_cna_ids[, grepl("ERBB", names(df_cna_ids)) | grepl("FAM72C", names(df_cna_ids)) | grepl("SRGAP2D", names(df_cna_ids)) | grepl("MDM4", names(df_cna_ids)) | grepl("PIK3C2B", names(df_cna_ids)) | grepl("LRRN2", names(df_cna_ids)) | grepl("NFASC", names(df_cna_ids)) | grepl("KLHDC8A", names(df_cna_ids)) | grepl("LEMD1-AS1", names(df_cna_ids)) | grepl("CDK18", names(df_cna_ids)) | grepl("PLEKHA6", names(df_cna_ids))]

erbb2_cols$PATIENT_ID_REF <- rownames(erbb2_cols)
erbb2_cols <- erbb2_cols |> relocate(PATIENT_ID_REF)
rownames(erbb2_cols) <- NULL
erbb2_cols = erbb2_cols[-1,]
erbb2_cols$PATIENT_ID <- substr(erbb2_cols$PATIENT_ID_REF, 1, nchar(erbb2_cols$PATIENT_ID_REF) - 3)

Important

Match the RNA Seq data with the CNA ids & the Patient Data
- Pathway Enrichment (Combination of enriched patient, sample, CNA and RNA Sequence data)

Code

# Merge RNA Seq data with CNA data  (ERBB2+ and other gene IDs meta data)
df_clin <- merge(x = df_mrna_seq, y = erbb2_cols, by = "PATIENT_ID", all = TRUE)

# Merge result with clinical patient data (data enrichment)
df_clin <- merge(x = df_clin, y = data_clinical$data_clinical_patient, by = "PATIENT_ID", all = TRUE)

# Merge in sample data by patient ID
df_clin <- merge(x = df_clin, y = data_clinical_sample$data_clinical_sample, by = "PATIENT_ID", all = TRUE)

Note

Check for top 10 mutations and have ER+ counts ready for amplified comparison (sums)

Code

temp_cna_df <- data_cna$data_cna
temp_cna_df[temp_cna_df < 0] <- 0
r_sums_cna <- temp_cna_df %>% 
  mutate(rowsums = select(., -c(1:2)) %>% rowSums(na.rm = TRUE))
r_sums_cna_ss <- select(r_sums_cna, c(Hugo_Symbol, rowsums))
all_r_sums_cna <- r_sums_cna_ss[order(r_sums_cna_ss$rowsums, decreasing = T),]
ebbr_r_sums_cna <- all_r_sums_cna |> filter(grepl("ERBB", Hugo_Symbol))

Warning

Equivalent Summary Table Snippet
- (First High Level breakdown, followed by further breakdown with SEQ data and then ER+ data)

Code

count_agg(data_clinical_sample$data_clinical_sample, "CANCER_TYPE_DETAILED", n_results=20, digits=0)

CANCER_TYPE_DETAILED	n	Freq
Breast Invasive Ductal Carcinoma	780	72
Breast Invasive Lobular Carcinoma	201	19
Breast Invasive Carcinoma (NOS)	77	7
Breast Invasive Mixed Mucinous Carcinoma	17	2
Metaplastic Breast Cancer	8	1
Invasive Breast Carcinoma	1	0

Code

count_agg(df_clin, "CANCER_TYPE_DETAILED", n_results=20, digits=2)

CANCER_TYPE_DETAILED	n	Freq
Breast Invasive Ductal Carcinoma	780	71.96
Breast Invasive Lobular Carcinoma	201	18.54
Breast Invasive Carcinoma (NOS)	77	7.10
Breast Invasive Mixed Mucinous Carcinoma	17	1.57
Metaplastic Breast Cancer	8	0.74
Invasive Breast Carcinoma	1	0.09

Code

count_agg(df_clin |> filter(ERBB2_SEQ > 0 & ERBB2 > 0), "CANCER_TYPE_DETAILED", n_results=20, digits=2)

CANCER_TYPE_DETAILED	n	Freq
Breast Invasive Ductal Carcinoma	268	81.71
Breast Invasive Lobular Carcinoma	37	11.28
Breast Invasive Carcinoma (NOS)	16	4.88
Breast Invasive Mixed Mucinous Carcinoma	4	1.22
Metaplastic Breast Cancer	3	0.91

Warning

Pie Charts from https://www.cbioportal.org/study/summary?id=brca_tcga_pan_can_atlas_2018 replicated as Summary Tables:

Code

count_agg(df_clin, "OS_STATUS", n_results=20, digits=2)

OS_STATUS	n	Freq
0:LIVING	933	86.07
1:DECEASED	151	13.93

Code

count_agg(df_clin, "SEX", n_results=20, digits=2)

SEX	n	Freq
Female	1072	98.89
Male	12	1.11

Code

count_agg(df_clin, "ETHNICITY", n_results=20, digits=2)

ETHNICITY	n	Freq
Not Hispanic Or Latino	877	80.90
	169	15.59
Hispanic Or Latino	38	3.51

Code

count_agg(df_clin, "RACE", n_results=20, digits=2)

RACE	n	Freq
White	751	69.28
Black or African American	182	16.79
	90	8.30
Asian	60	5.54
American Indian or Alaska Native	1	0.09

Code

count_agg(df_clin, "SUBTYPE", n_results=20, digits=2)

SUBTYPE	n	Freq
BRCA_LumA	499	46.03
BRCA_LumB	197	18.17
BRCA_Basal	171	15.77
	103	9.50
BRCA_Her2	78	7.20
BRCA_Normal	36	3.32

Equivalent Charts Snippet

Important

Not Amplified Summary Tables by other enrichment features
- Cancer type, cancer sub type, patient cancer status.

Code

count_agg(df_clin, "CANCER_TYPE_ACRONYM", n_results=20, digits=2)

CANCER_TYPE_ACRONYM	n	Freq
BRCA	1084	100

Code

count_agg(df_clin, "SUBTYPE", n_results=20, digits=2)

SUBTYPE	n	Freq
BRCA_LumA	499	46.03
BRCA_LumB	197	18.17
BRCA_Basal	171	15.77
	103	9.50
BRCA_Her2	78	7.20
BRCA_Normal	36	3.32

Code

count_agg(df_clin, "PERSON_NEOPLASM_CANCER_STATUS", n_results=20, digits=2)

PERSON_NEOPLASM_CANCER_STATUS	n	Freq
Tumor Free	870	80.26
	123	11.35
With Tumor	91	8.39

Important

ER+ Summary Tables

Code

count_agg(df_clin, "ERBB2", n_results=20, digits=2)

ERBB2	n	Freq
0	481	44.37
-1	260	23.99
1	206	19.00
2	123	11.35
NA	14	1.29

Code

count_agg(df_clin, "ERBB2IP", n_results=20, digits=2)

ERBB2IP	n	Freq
0	592	54.61
-1	281	25.92
1	187	17.25
NA	14	1.29
-2	10	0.92

Code

count_agg(df_clin, "ERBB3", n_results=20, digits=2)

ERBB3	n	Freq
0	701	64.67
1	218	20.11
-1	149	13.75
NA	14	1.29
2	2	0.18

Code

count_agg(df_clin, "ERBB4", n_results=20, digits=2)

ERBB4	n	Freq
0	710	65.50
-1	253	23.34
1	93	8.58
NA	14	1.29
-2	7	0.65
2	7	0.65

Important

ERBB2 Amplified data grouped by other columns

Code

count_agg(df_clin |> filter(ERBB2 > 0 & ERBB2_SEQ > 0), "CANCER_TYPE_ACRONYM", n_results=20, digits=2)

CANCER_TYPE_ACRONYM	n	Freq
BRCA	328	100

Code

count_agg(df_clin |> filter(ERBB2 > 0 & ERBB2_SEQ > 0), "SUBTYPE", n_results=20, digits=2)

SUBTYPE	n	Freq
BRCA_LumA	113	34.45
BRCA_LumB	93	28.35
BRCA_Her2	62	18.90
BRCA_Basal	29	8.84
	28	8.54
BRCA_Normal	3	0.91

Code

count_agg(df_clin |> filter(ERBB2 > 0 & ERBB2_SEQ > 0), "PERSON_NEOPLASM_CANCER_STATUS", n_results=20, digits=2)

PERSON_NEOPLASM_CANCER_STATUS	n	Freq
Tumor Free	261	79.57
	36	10.98
With Tumor	31	9.45

Important

Amplified by ERBB2 & MRNA Seq

Code

count_agg(df_clin |> filter(ERBB2 > 0 & ERBB2_SEQ > 0), "ERBB2", n_results=20, digits=2)

ERBB2	n	Freq
1	206	62.8
2	122	37.2

Amplified by ERBB2IP & MRNA Seq

Code

count_agg(df_clin |> filter(ERBB2IP > 0 & ERBB2IP_SEQ > 0), "ERBB2IP", n_results=20, digits=2)

ERBB2IP	n	Freq
1	187	100

Important

Amplified by ERBB3 & MRNA Seq

Code

count_agg(df_clin |> filter(ERBB3 > 0 & ERBB3_SEQ > 0), "ERBB3", n_results=20, digits=2)

ERBB3	n	Freq
1	218	99.09
2	2	0.91

Amplified by ERBB4 & MRNA Seq

Code

count_agg(df_clin |> filter(ERBB4 > 0 & ERBB4_SEQ > 0), "ERBB4", n_results=20, digits=2)

ERBB4	n	Freq
1	10	100

Warning

Load guide script and compare with count variable test_meta_erbb2_length.

Code

suppressWarnings(source("Assignment_Guide.R"))

Verify guide script count samples amplified by ERBB2 matches my code.
The counts now match after adding SEQ data filter for ERBB2 column (ERBB2_SEQ > 0)

Code

test_meta_erbb2_length <- length(meta_erbb2[meta_erbb2[,"ERBB2Amp"] == 1])
test_meta_erbb2_length

[1] 328

Code

length(meta_erbb2[meta_erbb2[,"ERBB2Amp"] == 0])

[1] 740

Code

length(meta_erbb2[meta_erbb2[,"ERBB2Amp"] == 0]) + length(meta_erbb2[meta_erbb2[,"ERBB2Amp"] == 1])

[1] 1068

Code

dim(rna_cna_sub)

[1] 20512  1068

Code

test_meta_erbb2_length == dim(df_clin |> filter(ERBB2_SEQ > 0 & ERBB2 > 0))[1]

[1] TRUE

Differential Expression Analysis

BRCA HER2+: Amplified by ERBB2 & Cancer Type Detailed Summary Table

Code

count_agg(df_clin |> filter(ERBB2_SEQ > 0 & ERBB2 > 0 & SUBTYPE == "BRCA_Her2"), "CANCER_TYPE_DETAILED", n_results=20, digits=2)

CANCER_TYPE_DETAILED	n	Freq
Breast Invasive Ductal Carcinoma	57	91.94
Breast Invasive Carcinoma (NOS)	2	3.23
Breast Invasive Lobular Carcinoma	2	3.23
Metaplastic Breast Cancer	1	1.61

BRCA HER2+: Amplified by ERBB2IP & Cancer Type Detailed Summary Table

Code

count_agg(df_clin |> filter(ERBB2IP_SEQ > 0 & ERBB2IP > 0 & SUBTYPE == "BRCA_Her2"), "CANCER_TYPE_DETAILED", n_results=20, digits=2)

CANCER_TYPE_DETAILED	n	Freq
Breast Invasive Ductal Carcinoma	7	87.5
Breast Invasive Lobular Carcinoma	1	12.5

BRCA HER2+: Amplified by ERBB3 & Cancer Type Detailed Summary Table

Code

count_agg(df_clin |> filter(ERBB3_SEQ > 0 & ERBB3 > 0 & SUBTYPE == "BRCA_Her2"), "CANCER_TYPE_DETAILED", n_results=20, digits=2)

CANCER_TYPE_DETAILED	n	Freq
Breast Invasive Ductal Carcinoma	17	80.95
Breast Invasive Lobular Carcinoma	3	14.29
Breast Invasive Carcinoma (NOS)	1	4.76

Note

ERBB4 not included as it is not relevant and no amplified results to summarise.

BRCA HER2: ERBB2 Summary Tables
Removing sequence data filter because *_SEQ filter for HER2- does not return any results

Code

count_agg(df_clin |> filter(SUBTYPE == "BRCA_Her2"), "ERBB2", n_results=20, digits=2)

ERBB2	n	Freq
2	55	70.51
-1	8	10.26
0	8	10.26
1	7	8.97

Code

count_agg(df_clin |> filter(SUBTYPE == "BRCA_Her2"), "ERBB2IP", n_results=20, digits=2)

ERBB2IP	n	Freq
-1	35	44.87
0	35	44.87
1	8	10.26

BRCA HER2: ERBB3 Summary Table

Code

count_agg(df_clin |> filter(SUBTYPE == "BRCA_Her2"), "ERBB3", n_results=20, digits=2)

ERBB3	n	Freq
0	47	60.26
1	20	25.64
-1	10	12.82
2	1	1.28

BRCA HER2: ERBB4 Summary Table

Code

count_agg(df_clin |> filter(SUBTYPE == "BRCA_Her2"), "ERBB4", n_results=20, digits=2)

ERBB4	n	Freq
0	39	50.00
-1	22	28.21
1	17	21.79

BRCA HER2: Cancer Type Detailed Summary Table

Code

count_agg(df_clin |> filter(SUBTYPE == "BRCA_Her2"), "CANCER_TYPE_DETAILED", n_results=20, digits=2)

CANCER_TYPE_DETAILED	n	Freq
Breast Invasive Ductal Carcinoma	72	92.31
Breast Invasive Lobular Carcinoma	3	3.85
Breast Invasive Carcinoma (NOS)	2	2.56
Metaplastic Breast Cancer	1	1.28

BRCA HER2: Patient Status Summary Table

Code

count_agg(df_clin |> filter(SUBTYPE == "BRCA_Her2"), "OS_STATUS", n_results=20, digits=2)

OS_STATUS	n	Freq
0:LIVING	63	80.77
1:DECEASED	15	19.23

BRCA HER2: MDM4 Summary Table

Code

count_agg(df_clin |> filter(SUBTYPE == "BRCA_Her2"), "MDM4", n_results=20, digits=2)

MDM4	n	Freq
1	52	66.67
0	15	19.23
2	10	12.82
-1	1	1.28

BRCA HER2: LRRN2 Summary Table

Code

count_agg(df_clin |> filter(SUBTYPE == "BRCA_Her2"), "LRRN2", n_results=20, digits=2)

LRRN2	n	Freq
1	52	66.67
0	15	19.23
2	10	12.82
-1	1	1.28

BRCA HER2: PIK3C2B Summary Table

Code

count_agg(df_clin |> filter(SUBTYPE == "BRCA_Her2"), "PIK3C2B", n_results=20, digits=2)

PIK3C2B	n	Freq
1	52	66.67
0	15	19.23
2	10	12.82
-1	1	1.28

Important

Normalize data using DESeq2 and Run DE gene analysis, generate PCA plots

DE Seq Run 1 (ERBB2)
The 2 principal components are ERBB2_SEQ & MDM4_SEQ for ERBB2 DE Seq Run grouped by patient status (0 for living & 1 for deceased)

Code

# Status is 1 or 0 which maps -> 0:LIVING & 1:DECEASED
de_ls1 <-
  pre_process_df(df_clin |> mutate(Status = as.numeric(substr(OS_STATUS, 1, 1))) |> filter(ERBB2 > 0 &
                                                                                             ERBB2_SEQ > 0) |>
                   select(
                     c(
                       Status,
                       ERBB2_SEQ,
                       ERBB2IP_SEQ,
                       ERBB3_SEQ,
                       ERBB4_SEQ,
                       MDM4_SEQ,
                       LRRN2_SEQ,
                       PIK3C2B_SEQ
                     )
                   ))
dds_run1 <-
  suppressMessages(suppressWarnings(DESeqDataSetFromMatrix(
    countData = de_ls1$countdata,
    colData = de_ls1$coldata,
    design = ~ ERBB2_SEQ
  )))
 suppressMessages(suppressWarnings(de_seq_run("Status", dds_run1)))

log2 fold change (MLE): ERBB2 SEQ 
Wald test p-value: ERBB2 SEQ 
DataFrame with 8 rows and 6 columns
               baseMean log2FoldChange       lfcSE      stat      pvalue
              <numeric>      <numeric>   <numeric> <numeric>   <numeric>
ERBB2_SEQ   4.43262e+04    2.64257e-05 6.82781e-07 38.703108 0.00000e+00
MDM4_SEQ    1.07397e+03   -3.19709e-06 4.14565e-07 -7.711912 1.23946e-14
ERBB4_SEQ   8.70415e+02   -1.00166e-05 1.56319e-06 -6.407794 1.47640e-10
LRRN2_SEQ   6.71901e+02   -5.03708e-06 1.14855e-06 -4.385605 1.15664e-05
ERBB2IP_SEQ 2.47022e+03   -1.78001e-06 4.26535e-07 -4.173187 3.00368e-05
ERBB3_SEQ   7.39463e+03   -1.70765e-06 5.27955e-07 -3.234462 1.21872e-03
PIK3C2B_SEQ 9.46785e+02    1.10020e-06 4.76158e-07  2.310584 2.08558e-02
Status      1.70048e-01   -7.42672e-07 3.84788e-06 -0.193008 8.46952e-01
                   padj
              <numeric>
ERBB2_SEQ   0.00000e+00
MDM4_SEQ    4.95786e-14
ERBB4_SEQ   3.93708e-10
LRRN2_SEQ   2.31327e-05
ERBB2IP_SEQ 4.80588e-05
ERBB3_SEQ   1.62496e-03
PIK3C2B_SEQ 2.38352e-02
Status      8.46952e-01

DE Seq Run 2 (ERBB2IP)
The 2 principal components are ERBB2IP_SEQ & PIK3C2B_SEQ for ERBB2IP DE Seq Run grouped by patient status (0 for living & 1 for deceased)

Code

de_ls2 <-
  pre_process_df(df_clin |> mutate(Status = as.numeric(substr(OS_STATUS, 1, 1))) |> filter(ERBB2IP > 0 & ERBB2IP_SEQ > 0) |>
                   select(
                     c(
                       Status,
                       ERBB2_SEQ,
                       ERBB2IP_SEQ,
                       ERBB3_SEQ,
                       ERBB4_SEQ,
                       MDM4_SEQ,
                       LRRN2_SEQ,
                       PIK3C2B_SEQ
                     )
                   ))
dds_run2 <-
  suppressMessages(suppressWarnings(DESeqDataSetFromMatrix(
    countData = de_ls2$countdata,
    colData = de_ls2$coldata,
    design = ~ ERBB2IP_SEQ
  )))
suppressMessages(suppressWarnings(de_seq_run("Status", dds_run2)))

log2 fold change (MLE): ERBB2IP SEQ 
Wald test p-value: ERBB2IP SEQ 
DataFrame with 8 rows and 6 columns
               baseMean log2FoldChange       lfcSE      stat      pvalue
              <numeric>      <numeric>   <numeric> <numeric>   <numeric>
ERBB2IP_SEQ 3.02377e+03    1.73541e-04 3.19770e-05  5.427064 5.72885e-08
PIK3C2B_SEQ 8.93973e+02   -1.58682e-04 3.44888e-05 -4.600976 4.20516e-06
LRRN2_SEQ   7.82808e+02   -3.25024e-04 7.71064e-05 -4.215267 2.49482e-05
ERBB2_SEQ   1.83024e+04   -3.77534e-04 1.06985e-04 -3.528854 4.17363e-04
ERBB4_SEQ   1.00909e+03    2.74506e-04 8.87036e-05  3.094640 1.97052e-03
ERBB3_SEQ   7.91247e+03    8.90916e-05 4.60256e-05  1.935697 5.29048e-02
MDM4_SEQ    1.14282e+03   -3.17019e-05 3.90457e-05 -0.811919 4.16838e-01
Status      1.41211e-01   -2.82167e-04 1.28899e-03 -0.218906 8.26723e-01
                   padj
              <numeric>
ERBB2IP_SEQ 4.58308e-07
PIK3C2B_SEQ 1.68206e-05
LRRN2_SEQ   6.65286e-05
ERBB2_SEQ   8.34727e-04
ERBB4_SEQ   3.15283e-03
ERBB3_SEQ   7.05398e-02
MDM4_SEQ    4.76386e-01
Status      8.26723e-01

DE Seq Run 3 (ERBB3)
The 2 principal components are ERBB3_SEQ & MDM4_SEQ for ERBB3 DE Seq Run grouped by patient status (0 for living & 1 for deceased)

Code

de_ls3 <-
  pre_process_df(df_clin |> mutate(Status = as.numeric(substr(OS_STATUS, 1, 1))) |> filter(ERBB3 > 0 & ERBB3_SEQ > 0) |>
                   select(
                     c(
                       Status,
                       ERBB2_SEQ,
                       ERBB2IP_SEQ,
                       ERBB3_SEQ,
                       ERBB4_SEQ,
                       MDM4_SEQ,
                       LRRN2_SEQ,
                       PIK3C2B_SEQ
                     )
                   ))
dds_run3 <-
  suppressMessages(suppressWarnings(DESeqDataSetFromMatrix(
    countData = de_ls3$countdata,
    colData = de_ls3$coldata,
    design = ~ ERBB3_SEQ
  )))
suppressMessages(suppressWarnings(de_seq_run("Status", dds_run3)))

log2 fold change (MLE): ERBB3 SEQ 
Wald test p-value: ERBB3 SEQ 
DataFrame with 8 rows and 6 columns
               baseMean log2FoldChange       lfcSE      stat      pvalue
              <numeric>      <numeric>   <numeric> <numeric>   <numeric>
ERBB3_SEQ   9.78153e+03    8.00922e-05 6.35230e-06 12.608375 1.89868e-36
MDM4_SEQ    1.09083e+03   -2.95370e-05 7.76117e-06 -3.805738 1.41382e-04
LRRN2_SEQ   6.45159e+02   -7.78044e-05 2.00852e-05 -3.873720 1.07186e-04
PIK3C2B_SEQ 8.81717e+02   -2.88337e-05 7.79687e-06 -3.698111 2.17210e-04
ERBB4_SEQ   9.76102e+02    5.60030e-05 2.43415e-05  2.300721 2.14074e-02
Status      1.60005e-01   -6.04383e-05 7.56041e-05 -0.799405 4.24056e-01
ERBB2IP_SEQ 2.49392e+03    4.53947e-06 8.03103e-06  0.565241 5.71910e-01
ERBB2_SEQ   1.99983e+04    1.03948e-05 2.44181e-05  0.425701 6.70326e-01
                   padj
              <numeric>
ERBB3_SEQ   1.51894e-35
MDM4_SEQ    3.77018e-04
LRRN2_SEQ   3.77018e-04
PIK3C2B_SEQ 4.34420e-04
ERBB4_SEQ   3.42518e-02
Status      5.65408e-01
ERBB2IP_SEQ 6.53611e-01
ERBB2_SEQ   6.70326e-01

DE Seq Run 4 (ERBB4)
The 2 principal components are ERBB4_SEQ & MDM4_SEQ for ERBB4 DE Seq Run grouped by patient status (0 for living & 1 for deceased)

Code

de_ls4 <-
  pre_process_df(df_clin |> mutate(Status = as.numeric(substr(OS_STATUS, 1, 1))) |> filter(ERBB4 > 0 & ERBB4_SEQ > 0) |>
                   select(
                     c(
                       Status,
                       ERBB2_SEQ,
                       ERBB2IP_SEQ,
                       ERBB3_SEQ,
                       ERBB4_SEQ,
                       MDM4_SEQ,
                       LRRN2_SEQ,
                       PIK3C2B_SEQ
                     )
                   ))
print(de_ls4$coldata)

      Status ERBB2_SEQ ERBB2IP_SEQ ERBB3_SEQ ERBB4_SEQ MDM4_SEQ LRRN2_SEQ
 [1,]      0      3577        3600      4916      1908      745       158
 [2,]      0      7586        1774      6981      2436     1292       393
 [3,]      0      4512        2000      3210      1916      946      2320
 [4,]      0      2638        2217      4095      2249     1022       854
 [5,]      0      7792        1811      6973      1174     1067       928
 [6,]      0      4312        1838      7305      1252      612        64
 [7,]      0      4163        3550      7711      1877      739      1302
 [8,]      0      5016        2462      7892      1228      678       454
 [9,]      0      2062        4450      3205      6078     1424       127
[10,]      1      8411        1846      8236      1301      904       981
      PIK3C2B_SEQ
 [1,]         926
 [2,]         876
 [3,]         525
 [4,]         644
 [5,]         753
 [6,]        1140
 [7,]        1482
 [8,]        1295
 [9,]         755
[10,]        1118

Code

dds_run4 <-
  suppressMessages(suppressWarnings(DESeqDataSetFromMatrix(
    countData = de_ls4$countdata,
    colData = de_ls4$coldata,
    design = ~ ERBB4_SEQ
  )))
suppressMessages(suppressWarnings(de_seq_run("Status", dds_run4)))

log2 fold change (MLE): ERBB4 SEQ 
Wald test p-value: ERBB4 SEQ 
DataFrame with 8 rows and 6 columns
               baseMean log2FoldChange       lfcSE       stat      pvalue
              <numeric>      <numeric>   <numeric>  <numeric>   <numeric>
ERBB4_SEQ   2220.831633    5.27406e-04 7.66146e-05  6.8838885 5.82405e-12
MDM4_SEQ     936.774611    2.43890e-04 7.57410e-05  3.2200518 1.28167e-03
ERBB2_SEQ   4743.502364   -2.45933e-04 9.18585e-05 -2.6773035 7.42174e-03
ERBB2IP_SEQ 2593.073566    2.72591e-04 1.11572e-04  2.4431823 1.45584e-02
ERBB3_SEQ   5868.304396   -1.86969e-04 8.83662e-05 -2.1158412 3.43583e-02
LRRN2_SEQ    701.828546   -4.42582e-04 2.78488e-04 -1.5892305 1.12008e-01
PIK3C2B_SEQ  935.070295   -5.23827e-05 1.18121e-04 -0.4434672 6.57428e-01
Status         0.081226   -6.52253e-05 1.14539e-03 -0.0569459 9.54588e-01
                   padj
              <numeric>
ERBB4_SEQ   4.65924e-11
MDM4_SEQ    5.12670e-03
ERBB2_SEQ   1.97913e-02
ERBB2IP_SEQ 2.91168e-02
ERBB3_SEQ   5.49733e-02
LRRN2_SEQ   1.49344e-01
PIK3C2B_SEQ 7.51346e-01
Status      9.54588e-01

DE Seq Run 5 (MDM4)
The 2 principal components are MDM4_SEQ & ERBB2IP_SEQ for MDM4 DE Seq Run grouped by patient status (0 for living & 1 for deceased)

Code

de_ls5 <-
  pre_process_df(df_clin |> mutate(Status = as.numeric(substr(OS_STATUS, 1, 1))) |> filter(MDM4 > 0 & MDM4_SEQ > 0) |>
                   select(
                     c(
                       Status,
                       ERBB2_SEQ,
                       ERBB2IP_SEQ,
                       ERBB3_SEQ,
                       ERBB4_SEQ,
                       MDM4_SEQ,
                       LRRN2_SEQ,
                       PIK3C2B_SEQ
                     )
                   ))
dds_run5 <-
  suppressMessages(suppressWarnings(DESeqDataSetFromMatrix(
    countData = de_ls5$countdata,
    colData = de_ls5$coldata,
    design = ~ MDM4_SEQ
  )))
suppressMessages(suppressWarnings(de_seq_run("Status", dds_run5)))

log2 fold change (MLE): MDM4 SEQ 
Wald test p-value: MDM4 SEQ 
DataFrame with 8 rows and 6 columns
               baseMean log2FoldChange       lfcSE      stat      pvalue
              <numeric>      <numeric>   <numeric> <numeric>   <numeric>
MDM4_SEQ    1413.862881    5.86591e-04 5.18331e-05 11.316922 1.08205e-29
ERBB2IP_SEQ 2428.981197   -1.47597e-04 6.88055e-05 -2.145130 3.19425e-02
LRRN2_SEQ    758.637500   -2.98945e-04 1.82434e-04 -1.638643 1.01288e-01
PIK3C2B_SEQ  911.947137   -1.35110e-04 8.24171e-05 -1.639349 1.01141e-01
ERBB2_SEQ   5385.630705   -1.07329e-04 8.53769e-05 -1.257124 2.08709e-01
Status         0.122042   -2.34863e-04 9.36742e-04 -0.250724 8.02028e-01
ERBB3_SEQ   6003.815103   -2.68901e-05 7.02650e-05 -0.382695 7.01946e-01
ERBB4_SEQ    945.032164    8.18780e-05 2.59663e-04  0.315324 7.52516e-01
                   padj
              <numeric>
MDM4_SEQ    8.65638e-29
ERBB2IP_SEQ 1.27770e-01
LRRN2_SEQ   2.02575e-01
PIK3C2B_SEQ 2.02575e-01
ERBB2_SEQ   3.33934e-01
Status      8.02028e-01
ERBB3_SEQ   8.02028e-01
ERBB4_SEQ   8.02028e-01

DE Seq Run 6 (LRNN2)
The 2 principal components are LRRN2_SEQ & ERBB2IP_SEQ for LRNN2 DE Seq Run grouped by patient status (0 for living & 1 for deceased)

Code

de_ls6 <-
  pre_process_df(df_clin |> mutate(Status = as.numeric(substr(OS_STATUS, 1, 1))) |> filter(LRRN2 > 0 & LRRN2_SEQ > 0) |>
                   select(
                     c(
                       Status,
                       ERBB2_SEQ,
                       ERBB2IP_SEQ,
                       ERBB3_SEQ,
                       ERBB4_SEQ,
                       MDM4_SEQ,
                       LRRN2_SEQ,
                       PIK3C2B_SEQ
                     )
                   ))
dds_run6 <-
  suppressMessages(suppressWarnings(DESeqDataSetFromMatrix(
    countData = de_ls6$countdata,
    colData = de_ls6$coldata,
    design = ~ LRRN2_SEQ
  )))
suppressMessages(suppressWarnings(de_seq_run("Status", dds_run6)))

log2 fold change (MLE): LRRN2 SEQ 
Wald test p-value: LRRN2 SEQ 
DataFrame with 8 rows and 6 columns
              baseMean log2FoldChange       lfcSE      stat      pvalue
             <numeric>      <numeric>   <numeric> <numeric>   <numeric>
LRRN2_SEQ   1690.86375    5.94369e-04 5.19608e-05 11.438809 2.67533e-30
ERBB2IP_SEQ 2174.58617   -1.28748e-04 6.96626e-05 -1.848162 6.45789e-02
ERBB3_SEQ   5619.76897   -1.33413e-04 7.27702e-05 -1.833345 6.67513e-02
ERBB2_SEQ   5784.72708   -6.99742e-05 6.03491e-05 -1.159491 2.46256e-01
PIK3C2B_SEQ  841.08082   -7.59215e-05 6.91094e-05 -1.098570 2.71956e-01
ERBB4_SEQ    814.68223    2.25254e-04 2.49301e-04  0.903544 3.66237e-01
Status         0.18505   -3.91644e-04 6.73050e-04 -0.581895 5.60638e-01
MDM4_SEQ    1100.85652   -2.82411e-05 7.30647e-05 -0.386521 6.99111e-01
                   padj
              <numeric>
LRRN2_SEQ   2.14027e-29
ERBB2IP_SEQ 1.78003e-01
ERBB3_SEQ   1.78003e-01
ERBB2_SEQ   4.35129e-01
PIK3C2B_SEQ 4.35129e-01
ERBB4_SEQ   4.88316e-01
Status      6.40729e-01
MDM4_SEQ    6.99111e-01

DE Seq Run 7 (PIK3C2B)
The 2 principal components are PIK3C2B_SEQ & ERBB2_SEQ for PIK3C2B DE Seq Run grouped by patient status (0 for living & 1 for deceased)

Code

de_ls7 <-
  pre_process_df(df_clin |> mutate(Status = as.numeric(substr(OS_STATUS, 1, 1))) |> filter(PIK3C2B > 0 & PIK3C2B_SEQ > 0) |>
                   select(
                     c(
                       Status,
                       ERBB2_SEQ,
                       ERBB2IP_SEQ,
                       ERBB3_SEQ,
                       ERBB4_SEQ,
                       MDM4_SEQ,
                       LRRN2_SEQ,
                       PIK3C2B_SEQ
                     )
                   ))
dds_run7 <-
  suppressMessages(suppressWarnings(DESeqDataSetFromMatrix(
    countData = de_ls7$countdata,
    colData = de_ls7$coldata,
    design = ~ PIK3C2B_SEQ
  )))
suppressMessages(suppressWarnings(de_seq_run("Status", dds_run7)))

log2 fold change (MLE): PIK3C2B SEQ 
Wald test p-value: PIK3C2B SEQ 
DataFrame with 8 rows and 6 columns
               baseMean log2FoldChange       lfcSE      stat      pvalue
              <numeric>      <numeric>   <numeric> <numeric>   <numeric>
PIK3C2B_SEQ 1305.258863    0.000822108 0.000093869  8.758029 1.98694e-18
ERBB2_SEQ   5831.200415   -0.000413143 0.000144945 -2.850340 4.36725e-03
ERBB3_SEQ   5958.388530   -0.000302321 0.000138666 -2.180213 2.92417e-02
ERBB2IP_SEQ 2370.047650   -0.000158254 0.000124985 -1.266186 2.05447e-01
ERBB4_SEQ    851.489384   -0.000775636 0.000542085 -1.430838 1.52477e-01
MDM4_SEQ    1175.744825    0.000214832 0.000140258  1.531688 1.25599e-01
LRRN2_SEQ    700.423822   -0.000439717 0.000327689 -1.341871 1.79638e-01
Status         0.111083   -0.000508982 0.002370282 -0.214735 8.29974e-01
                   padj
              <numeric>
PIK3C2B_SEQ 1.58956e-17
ERBB2_SEQ   1.74690e-02
ERBB3_SEQ   7.79779e-02
ERBB2IP_SEQ 2.34796e-01
ERBB4_SEQ   2.34796e-01
MDM4_SEQ    2.34796e-01
LRRN2_SEQ   2.34796e-01
Status      8.29974e-01

Important

Obtain Deferentially Expressed Genes

Top 10 Deferentially Expressed Genes Ranked (Upgraded)

Code

knitr::kable(all_r_sums_cna[c(1:10),])

	Hugo_Symbol	rowsums
1313	FAM72C	974
1386	SRGAP2D	969
2094	MDM4	912
2093	PIK3C2B	910
2095	LRRN2	908
2096	NFASC	908
2103	KLHDC8A	907
2104	LEMD1-AS1	907
2108	CDK18	907
2090	PLEKHA6	906

Code

# Hugo_Symbol   row_sums
# MDM4  912 
# PIK3C2B   910 
# LRRN2 908 
# NFASC 908 
# KLHDC8A   907 
# CDK18 907 
# ** denotes have SEQ data AND CNA data

ER+ Deferentially Expressed Genes Ranked (Upgraded)

Code

knitr::kable(ebbr_r_sums_cna)

Hugo_Symbol	rowsums
ERBB2	452
ERBB3	222
ERBB2IP	187
ERBB4	107

18 Downgraded Deferentially Expressed Genes Ranked
- TNFSF gene mutations (The Tumour Necrosis Factor Superfam) occur three times (1 combination) in the 18 downgraded ranked gene mutations. This is significant as these gene mutations could also be targeted for breast cancer treatment.

Code

knitr::kable(all_r_sums_cna[c((dim(all_r_sums_cna)[1])[1]:(dim(all_r_sums_cna)[1]-18)),])

	Hugo_Symbol	rowsums
18970	SOX15	52
18969	MPDU1	52
18967	SNORA67	52
18966	CD68	52
18965	SNORD10	52
18964	SNORA48	52
18963	EIF4A1	52
18961	SENP3	52
18960	SENP3-EIF4A1	52
19033	MYH2	53
19032	MYH1	53
19031	MYH4	53
18976	EFNB3	53
18975	WRAP53	53
18971	SHBG	53
18968	FXR2	53
18962	TNFSF13	53
18959	TNFSF12	53
18958	TNFSF12-TNFSF13	53

Summary Table per Selected Gene Mutation from Top 10 list (6x)

Code

count_agg(df_clin, "MDM4", n_results=20, digits=2)

MDM4	n	Freq
1	722	66.61
0	239	22.05
2	95	8.76
-1	14	1.29
NA	14	1.29

Code

count_agg(df_clin, "PIK3C2B", n_results=20, digits=2)

PIK3C2B	n	Freq
1	724	66.79
0	240	22.14
2	93	8.58
NA	14	1.29
-1	13	1.20

Code

count_agg(df_clin, "LRRN2", n_results=20, digits=2)

LRRN2	n	Freq
1	720	66.42
0	239	22.05
2	94	8.67
-1	16	1.48
NA	14	1.29
-2	1	0.09

Code

count_agg(df_clin, "NFASC", n_results=20, digits=2)

NFASC	n	Freq
1	718	66.24
0	239	22.05
2	95	8.76
-1	17	1.57
NA	14	1.29
-2	1	0.09

Code

count_agg(df_clin, "KLHDC8A", n_results=20, digits=2)

KLHDC8A	n	Freq
1	715	65.96
0	244	22.51
2	96	8.86
-1	14	1.29
NA	14	1.29
-2	1	0.09

Code

count_agg(df_clin, "CDK18", n_results=20, digits=2)

CDK18	n	Freq
1	713	65.77
0	244	22.51
2	97	8.95
-1	15	1.38
NA	14	1.29
-2	1	0.09

Important

Pathway Enrichment Analysis
- Create base data frame for amplified data (to filter down results) and then data frame for each ERBB2+ and top gene mutation columns amplified

Code

df_clin_amp_erbb_plus <- df_clin |> filter(ERBB2 > 0 | ERBB2IP > 0 | ERBB3 > 0 | ERBB2IP > 0) 

df_clin_amp_erbb2 <- df_clin |> filter(ERBB2 > 0 & ERBB2_SEQ > 0)
df_clin_amp_erbb2ip <- df_clin |> filter(ERBB2IP & ERBB2IP_SEQ > 0)
df_clin_amp_erbb3 <- df_clin |> filter(ERBB3 > 0 & ERBB3_SEQ > 0)
df_clin_amp_erbb4 <- df_clin |> filter(ERBB4 > 0 & ERBB4_SEQ > 0)

df_clin_amp_top_features <- df_clin |> filter(MDM4 > 0 | PIK3C2B > 0 | LRRN2 > 0 | NFASC > 0 | KLHDC8A > 0 | CDK18 > 0) 

df_clin_amp_mdm4 <- df_clin |> filter(MDM4 > 0 & MDM4_SEQ > 0)
df_clin_amp_pik3c2b <- df_clin |> filter(PIK3C2B & PIK3C2B_SEQ > 0)
df_clin_amp_lrrn2 <- df_clin |> filter(LRRN2 > 0 & LRRN2_SEQ > 0)
df_clin_amp_nfasc <- df_clin |> filter(NFASC > 0 & NFASC_SEQ > 0)
df_clin_amp_klhdc8a <- df_clin |> filter(KLHDC8A > 0 & KLHDC8A_SEQ > 0)
df_clin_amp_cdk18 <- df_clin |> filter(CDK18 > 0 & CDK18_SEQ > 0)

Important

Get the variance stabilized transformed expression values.

Code

erbbp_ls <- c(var(df_clin_amp_erbb2$ERBB2), var(df_clin_amp_erbb2ip$ERBB2IP), var(df_clin_amp_erbb3$ERBB3), var(df_clin_amp_erbb4$ERBB4))
matrix_erbbp <- matrix(erbbp_ls)
rownames(matrix_erbbp) <- c("ERBB2", "ERBB2IP", "ERBB3", "ERBB4")
colnames(matrix_erbbp) <- c("Variance")
matrix_erbbp

           Variance
ERBB2   0.234317894
ERBB2IP 1.008887832
ERBB3   0.009049398
ERBB4   0.000000000

Code

# Show sorted matrix variance values in descending order
matrix_erbbp[order(matrix_erbbp[,1],decreasing=T),]

    ERBB2IP       ERBB2       ERBB3       ERBB4 
1.008887832 0.234317894 0.009049398 0.000000000

Code

erbb_seq_ls <- c(var(df_clin_amp_erbb2$ERBB2_SEQ), var(df_clin_amp_erbb2ip$ERBB2IP_SEQ), var(df_clin_amp_erbb3$ERBB3_SEQ), var(df_clin_amp_erbb4$ERBB4_SEQ))
matrix_erbb_seq <- matrix(erbb_seq_ls)
rownames(matrix_erbb_seq) <- c("ERBB2_SEQ", "ERBB2IP_SEQ", "ERBB3_SEQ", "ERBB4_SEQ")
colnames(matrix_erbb_seq) <- c("Variance")
matrix_erbb_seq

              Variance
ERBB2_SEQ   4036630410
ERBB2IP_SEQ    1186963
ERBB3_SEQ     20891406
ERBB4_SEQ      2114973

Code

# Show sorted matrix variance values in descending order
matrix_erbb_seq[order(matrix_erbb_seq[,1], decreasing=T),]

  ERBB2_SEQ   ERBB3_SEQ   ERBB4_SEQ ERBB2IP_SEQ 
 4036630410    20891406     2114973     1186963

Code

# Other Top Mutations (6 from Top 10)
top_6_ls <- c(var(df_clin_amp_mdm4$MDM4), var(df_clin_amp_pik3c2b$PIK3C2B), var(df_clin_amp_lrrn2$LRRN2), var(df_clin_amp_nfasc$NFASC), var(df_clin_amp_klhdc8a$KLHDC8A), var(df_clin_amp_cdk18$CDK18))
matrix_top_6 <- matrix(top_6_ls)
rownames(matrix_top_6) <- c("MDM4", "PIK3C2B", "LRRN2", "NFASC", "KLHDC8A", "CDK18")
colnames(matrix_top_6) <- c("Variance")
matrix_top_6

          Variance
MDM4    0.11255187
PIK3C2B 0.14802490
LRRN2   0.10687089
NFASC   0.09014085
KLHDC8A 0.00000000
CDK18   0.10565544

Code

# Show sorted matrix variance values in descending order
matrix_top_6[order(matrix_top_6[,1],decreasing=T),]

   PIK3C2B       MDM4      LRRN2      CDK18      NFASC    KLHDC8A 
0.14802490 0.11255187 0.10687089 0.10565544 0.09014085 0.00000000

Conclusion

Gene Mutations PIK3C2B, MDM4, and LRRN2 are a good choice of gene IDs to target based on my analysis for treatment pathways. The amplified value frequencies and eventual variance values sorted in descending order from the available clinical & sequence data emphasizes this.
Phosphatidylinositol 4-Phosphate 3-Kinase, Catalytic Sub-Unit Type 2 Beta Gene (PIK3C2B). The PIK3C2B gene plays a part in hormone positive breast cancer cases. A mutation in the PIK3C2B gene can cause cells to split and replicate uncontrollably. It contributes to the growth of many cancers such as Metastatic Breast Cancer (MBC). If the tumour has a PIK3C2B mutation, then new treatments that specifically target this mutation could be used for treatment.
Mouse Double Minute 4 Homolog (MDM4) as a regulator of P53 is a protein coding gene. MDM4 promotes breast cancer and can impede the transcriptional activity of p53. The evidence is that MDM4 plays a notable part in breast cancer formation, progression and prognosis. It is reasonable to suggest this should be a targeted pathway.
MDM4 is a critical regulator of the tumour supressor p53. it restricts p53 transriptional activity & enables MDM2’s E3 ligase activity toward p53. These functions of MDM4 are vital for normal cell function and a true response to stress. The MDM2 gene is a gene whose product binds to p53 and regulates its functions. A differential expression of MDM2 gene in relation to Oestregen receptor status was found in human breast cancer cell lines. MDM4 is a rational target for treating breast cancers with mutated p53. It is a key driver of triple negative cancers.
Leucine Rich Repeat Neuronal 2 (LRRN2) was found to be amplified and overexpressed in breast cancer along with MDM4.

Note

Code

top_6_seq_ls <- c(var(df_clin_amp_mdm4$MDM4_SEQ), var(df_clin_amp_pik3c2b$PIK3C2B_SEQ), var(df_clin_amp_lrrn2$LRRN2_SEQ), var(df_clin_amp_nfasc$NFASC_SEQ), var(df_clin_amp_klhdc8a$KLHDC8A_SEQ), var(df_clin_amp_cdk18$CDK18_SEQ))
matrix_top_6_seq <- matrix(top_6_seq_ls)
rownames(matrix_top_6_seq) <- c("MDM4", "PIK3C2B", "LRRN2", "NFASC", "KLHDC8A", "CDK18")
colnames(matrix_top_6_seq) <- c("Variance")
matrix_top_6_seq

          Variance
MDM4     182025.63
PIK3C2B   83973.54
LRRN2    435329.73
NFASC   1153196.62
KLHDC8A 1275971.18
CDK18    192181.73

Code

# Show sorted matrix variance values in descending order
matrix_top_6_seq[order(matrix_top_6_seq[,1],decreasing=T),]

   KLHDC8A      NFASC      LRRN2      CDK18       MDM4    PIK3C2B 
1275971.18 1153196.62  435329.73  192181.73  182025.63   83973.54

Github

https://github.com/conorheffron/gene-expr

References

Other Formats