We first want to install all of the packages used in this analysis. We use the package renv
as a way to manage package versions. The following code was initially run to set up the package library.
install.packages("renv")
library(renv)
renv::init(bare = TRUE)
install.packages("BiocManager")
library(BiocManager)
install.packages(c("ggplot2", "magrittr", "knitr", "kableExtra", "ggrepel",
"rmarkdown", "formatR", "bitops", "caTools","cowplot",
"Rcpp","digest","evaluate","htmltools","jsonlite",
"base64enc","rprojroot","highr","xfun","stringi"),
repo = BiocManager::repositories())
renv::snapshot()
If you are attempting to reproduce this analysis, you can 1) install/load renv
, 2) copy/paste the renv.lock
file into your current working directory and 3) run the renv::restore()
command to automatically install all of the packages with the same version.
# Make sure the 'renv.lock' file is in your current working directory.
install.packages("renv")
library(renv)
renv::restore()
Next, R packages will be loaded and scripts will be sourced.
source("logistic_regression_util.R")
source("maf_utils.R")
library("ggplot2")
library("cowplot")
library("magrittr")
library("knitr")
library("kableExtra")
library("ggrepel")
knitr::opts_chunk$set(fig.width=12, fig.height=8)
The file containing Mutation burden (i.e. mutation per Mb) and MSI or POLE status was obtained from the DNA damage TCGA working group from the PanCanAtlas consortium. This work was published in Cell and can be found at:
https://www.cell.com/cell-reports/pdf/S2211-1247(18)30437-6.pdf
.
Data from the paper can be found at:
https://gdc.cancer.gov/about-data/publications/PanCan-DDR-2018
.
The ABSOLUTE call file can be found at:
http://api.gdc.cancer.gov/data/4f277128-f793-4354-a13d-30cc7fe9f6b5
.
# Read in mutation rate data
mut.rates = read.table("../../Data/Mutation_Rate_and-MSI.txt", header=TRUE, stringsAsFactors = FALSE, sep="\t", row.names = 1)
# Read in ABSOLUTE calls and combine with mut.rates
absolute = read.table("../../Data/TCGA_mastercalls.abs_tables_JSedit.fixed.txt", header = TRUE, stringsAsFactors = FALSE, sep = "\t", row.names = 1)
mut.rates$Ploidy = absolute[rownames(mut.rates), "ploidy"]
mut.rates$PloidyCorrected_MutationPerMb = log10(mut.rates$Non.silentMutationsperMb) / mut.rates$Ploidy
mut.rates$WGD = factor(ifelse(mut.rates$Genome_doublings > 0, "WGD", "Non-WGD"), levels = c("Non-WGD", "WGD"))
mut.rates$MSI_POLE_Status = factor(ifelse(mut.rates$MSIPOLE == "MSIPOLE", "Yes", "No"), levels = c("No", "Yes"))
mut.rates$Log10_MutsPerMB = log10(mut.rates$Non.silentMutationsperMb)
# Make tumor type a factor ordered by median mutation rate
agg = aggregate(mut.rates$Non.silentMutationsperMb,
by=list(mut.rates$Type),
"median")
tumor.type = agg[order(agg[,2]),1]
mut.rates$Type = factor(mut.rates$Type, levels = tumor.type)
# Create subsets
mut.rates.hyper = subset(mut.rates, MSI_POLE_Status == "Yes")
mut.rates.nonhyper = subset(mut.rates, MSI_POLE_Status == "No")
All tumor ids in mut.rates
match an entry in the PanCanAtlas MC3 maf, however some of the tumors are no longer present after filtering out poor quality variants. Therefore, the Tumor_Sample_Barcode
column within the maf is converted to a factor and the levels are set to the tumors found in mut.rates
so each tumor will get a column in the muts
table generated by xtabs
. The maf file was obtained from http://api.gdc.cancer.gov/data/1c8cfe5f-e52d-41ba-94da-f15ea1337efc
. PanCan MutSig2CV results were obtained from https://api.gdc.cancer.gov/data/2e8d4650-5498-4288-a461-23667ff553e2
.
# Read in file and subset to filtered variants
maf = read.table(gzfile("../../Data/mc3.v0.2.8.PUBLIC.maf.gz"), sep="\t", header=TRUE, stringsAsFactors = FALSE, quote="")
maf = subset(maf, FILTER %in% c("PASS", "wga", "native_wga_mix"))
# Substring ID and convert to factor with appropriate levels matching all tumors in mut.rates
maf$Tumor_Sample_Barcode = factor(substring(maf$Tumor_Sample_Barcode, 1, 15), levels = rownames(mut.rates))
# Subset to nonsynonymous variants
maf.c = maf.coding(maf)
muts = xtabs(~ Hugo_Symbol + Tumor_Sample_Barcode, data = maf.c)
# Subset further to significantly mutated genes from MutSig2CV
fs = list.files(path="../../Data/MutSig2CV/", full.names = TRUE)
fs = setdiff(fs, "../../Data/MutSig2CV//README.txt")
genes = c()
for(i in fs) {
temp = read.table(i, header=TRUE, stringsAsFactors=FALSE, sep="\t", quote="")
temp = subset(temp, qvalue < 0.1)
genes = c(genes, temp[,1])
}
genes = sort(intersect(unique(genes), rownames(muts)))
muts = muts[genes,]
# Clear up some memory
rm(maf)
rm(maf.c)
We need to examine the distribution of mutation rates to determine appropriate statistical tests. In parts A and B, the density of non-silent mutations is plotted across 9414 tumors from TCGA. Densities are plotted separately for tumors with MSI or POLE tumors, which are expected to have higher mutation rates in general. By applying a log10 transformation in part B, we can more clearly see the distubtions.
In parts B and C, the density of non-silent mutations is plotted across 9240 tumors from TCGA that do NOT fall into MSI or POLE categories. In part C, we can see that although hypermutated samples are excluded, there is still a long tailed distribution. The application of log10 in part D makes the distrubution appear more “bell-shapped”. Therefore, we should either use non-parametric tests (e.g. Wilcoxon) or use logged transformed values with parametric tests (e.g. t-test).
p1 = ggplot(mut.rates, aes(x=Non.silentMutationsperMb, fill=MSI_POLE_Status)) + geom_density(alpha = 0.5) + theme_bw() + ggtitle("All tumors, no log transformation")
p2 = ggplot(mut.rates, aes(x=Non.silentMutationsperMb, fill=MSI_POLE_Status)) + geom_density(alpha = 0.5) + theme_bw() + ggtitle("All tumors, with Log10 transformation") + scale_x_log10()
p3 = ggplot(mut.rates.nonhyper, aes(x=Non.silentMutationsperMb, fill=MSI_POLE_Status)) + geom_density(alpha = 0.5) + theme_bw() + ggtitle("Non-hypermutated tumors, no log transformation")
p4 = ggplot(mut.rates.nonhyper, aes(x=Non.silentMutationsperMb, fill=MSI_POLE_Status)) + geom_density(alpha = 0.5) + theme_bw() + ggtitle("Non-hypermutated tumors, with Log10 transformation") + scale_x_log10()
plot_grid(p1, p2, p3, p4, labels = "AUTO")
Next, we will use linear modeling to examine associations between mutation burden per Mb and WGD status while controlling for tumor type and MSI/POLE status. Note that in the “Estimate” column in the “Coefficients” section within the summary tables, the (Intercept)
is the baseline group and refers to the mean mutation rate for , non-WGD, non-MSI/POLE tumors. WGDWGD
refers to the difference between the baseline group and WGD tumors. MSI_POLE_StatusYes
refers to the difference between the baseline group and tumors with MSI/POLE status. Rows for each tumor type (e.g. TypeBLCA
) refer to the mean difference between each tumor type and the baseline group.
There are no significant associations between WGD status and mutation rate when using the non-transformed values. However, when using log-transformed values, we see strong associations with WGD status. This is true when including MSI/POLE tumors and controlling for this in the model AND when subsetting to non-MSI/POLE tumors and estimating a new model.
Overall, these results show that there is an overall strong association between WGD status and mutation rate when averaging across all tumor types . This analysis does not show whether this trend is true for all tumor types or if a subset of tumor types are driving the association.
# Controlling for tumor type and Hypermutation
model.full = lm(Non.silentMutationsperMb ~ WGD + MSI_POLE_Status + Type, data=mut.rates)
summary(model.full)
##
## Call:
## lm(formula = Non.silentMutationsperMb ~ WGD + MSI_POLE_Status +
## Type, data = mut.rates)
##
## Residuals:
## Min 1Q Median 3Q Max
## -81.93 -1.90 -0.61 0.07 650.40
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.27857 1.36847 0.204 0.838698
## WGDWGD -0.07853 0.40587 -0.193 0.846574
## MSI_POLE_StatusYes 76.16014 1.43716 52.994 < 2e-16 ***
## TypeTHCA 0.01462 1.58542 0.009 0.992641
## TypeTHYM 0.16719 2.16576 0.077 0.938469
## TypeTGCT 0.12371 2.00686 0.062 0.950848
## TypeUVM 0.20736 2.35327 0.088 0.929786
## TypeLAML 1.07605 2.13611 0.504 0.614455
## TypeKICH -0.68550 2.52654 -0.271 0.786151
## TypeACC 0.17775 2.27859 0.078 0.937822
## TypeMESO 0.48090 2.36288 0.204 0.838730
## TypeCHOL 0.84852 3.16544 0.268 0.788660
## TypePRAD 0.80560 1.58180 0.509 0.610558
## TypeLGG 1.05494 1.56799 0.673 0.501094
## TypeSARC 0.67159 1.78782 0.376 0.707185
## TypePAAD 2.73553 1.94006 1.410 0.158565
## TypeBRCA 1.52186 1.47839 1.029 0.303317
## TypeUCS 2.05381 2.68098 0.766 0.443655
## TypeGBM 2.53184 1.63092 1.552 0.120600
## TypeKIRC 1.20545 1.64979 0.731 0.465000
## TypeKIRP 1.18523 1.71607 0.691 0.489792
## TypeUCEC 6.39443 1.60428 3.986 6.77e-05 ***
## TypeCESC 3.36050 1.98163 1.696 0.089952 .
## TypeLIHC 2.22482 1.68165 1.323 0.185869
## TypeOV 2.80459 1.63025 1.720 0.085404 .
## TypeESCA 2.28297 1.93751 1.178 0.238706
## TypeREAD 6.41732 2.07244 3.097 0.001964 **
## TypeDLBC 2.81262 3.13054 0.898 0.368972
## TypeHNSC 3.62029 1.57362 2.301 0.021436 *
## TypeCOAD 4.13320 1.64851 2.507 0.012185 *
## TypeSTAD 8.42660 1.60426 5.253 1.53e-07 ***
## TypeBLCA 6.15916 1.62559 3.789 0.000152 ***
## TypeLUAD 7.61471 1.57895 4.823 1.44e-06 ***
## TypeLUSC 7.11767 1.58949 4.478 7.63e-06 ***
## TypeSKCM 14.45046 1.58825 9.098 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.13 on 9379 degrees of freedom
## Multiple R-squared: 0.2972, Adjusted R-squared: 0.2946
## F-statistic: 116.7 on 34 and 9379 DF, p-value: < 2.2e-16
# Controlling for tumor type and Hypermutation with log transformation
model.full.log10 = lm(log10(Non.silentMutationsperMb) ~ WGD + MSI_POLE_Status + Type, data=mut.rates)
summary(model.full.log10)
##
## Call:
## lm(formula = log10(Non.silentMutationsperMb) ~ WGD + MSI_POLE_Status +
## Type, data = mut.rates)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.34038 -0.19150 -0.01097 0.16847 2.16187
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.659856 0.029526 -22.348 < 2e-16 ***
## WGDWGD 0.087711 0.008757 10.016 < 2e-16 ***
## MSI_POLE_StatusYes 1.181884 0.031008 38.116 < 2e-16 ***
## TypeTHCA 0.024806 0.034207 0.725 0.46836
## TypeTHYM 0.032610 0.046728 0.698 0.48528
## TypeTGCT 0.014279 0.043300 0.330 0.74158
## TypeUVM 0.175998 0.050774 3.466 0.00053 ***
## TypeLAML 0.150160 0.046088 3.258 0.00113 **
## TypeKICH 0.309860 0.054512 5.684 1.35e-08 ***
## TypeACC 0.495454 0.049163 10.078 < 2e-16 ***
## TypeMESO 0.445372 0.050981 8.736 < 2e-16 ***
## TypeCHOL 0.493489 0.068297 7.226 5.38e-13 ***
## TypePRAD 0.485802 0.034129 14.234 < 2e-16 ***
## TypeLGG 0.481022 0.033831 14.218 < 2e-16 ***
## TypeSARC 0.498500 0.038574 12.923 < 2e-16 ***
## TypePAAD 0.556086 0.041858 13.285 < 2e-16 ***
## TypeBRCA 0.661379 0.031898 20.734 < 2e-16 ***
## TypeUCS 0.751159 0.057844 12.986 < 2e-16 ***
## TypeGBM 0.830007 0.035188 23.588 < 2e-16 ***
## TypeKIRC 0.755898 0.035596 21.236 < 2e-16 ***
## TypeKIRP 0.761471 0.037026 20.566 < 2e-16 ***
## TypeUCEC 0.968828 0.034614 27.990 < 2e-16 ***
## TypeCESC 0.971555 0.042755 22.724 < 2e-16 ***
## TypeLIHC 0.914194 0.036283 25.196 < 2e-16 ***
## TypeOV 0.921154 0.035174 26.188 < 2e-16 ***
## TypeESCA 0.961971 0.041803 23.012 < 2e-16 ***
## TypeREAD 1.029012 0.044715 23.013 < 2e-16 ***
## TypeDLBC 1.057293 0.067544 15.653 < 2e-16 ***
## TypeHNSC 1.065720 0.033952 31.389 < 2e-16 ***
## TypeCOAD 1.162253 0.035568 32.677 < 2e-16 ***
## TypeSTAD 1.216415 0.034613 35.143 < 2e-16 ***
## TypeBLCA 1.274766 0.035073 36.346 < 2e-16 ***
## TypeLUAD 1.284862 0.034067 37.716 < 2e-16 ***
## TypeLUSC 1.389252 0.034295 40.509 < 2e-16 ***
## TypeSKCM 1.499240 0.034268 43.751 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3696 on 9379 degrees of freedom
## Multiple R-squared: 0.6017, Adjusted R-squared: 0.6002
## F-statistic: 416.7 on 34 and 9379 DF, p-value: < 2.2e-16
# Testing hypermutated samples only
model.hyper = lm(Non.silentMutationsperMb ~ WGD + Type, data=mut.rates.hyper)
summary(model.hyper)
##
## Call:
## lm(formula = Non.silentMutationsperMb ~ WGD + Type, data = mut.rates.hyper)
##
## Residuals:
## Min 1Q Median 3Q Max
## -161.55 -69.97 -37.57 33.57 651.01
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.967 116.849 -0.025 0.9798
## WGDWGD 19.460 50.694 0.384 0.7016
## TypeACC -11.331 128.941 -0.088 0.9301
## TypePRAD 191.176 157.281 1.216 0.2260
## TypeLGG 324.692 157.281 2.064 0.0406 *
## TypeSARC 4.085 157.281 0.026 0.9793
## TypePAAD 389.267 157.281 2.475 0.0144 *
## TypeBRCA 22.773 117.097 0.194 0.8461
## TypeUCS 106.988 157.281 0.680 0.4974
## TypeGBM 170.342 122.735 1.388 0.1672
## TypeUCEC 85.189 117.058 0.728 0.4679
## TypeCESC 47.300 157.281 0.301 0.7640
## TypeESCA -11.979 148.888 -0.080 0.9360
## TypeREAD 194.948 128.159 1.521 0.1302
## TypeCOAD 47.550 118.022 0.403 0.6876
## TypeSTAD 114.071 125.978 0.905 0.3666
## TypeBLCA 47.809 131.408 0.364 0.7165
## TypeSKCM 36.950 157.281 0.235 0.8146
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 105.3 on 156 degrees of freedom
## Multiple R-squared: 0.1566, Adjusted R-squared: 0.06472
## F-statistic: 1.704 on 17 and 156 DF, p-value: 0.04708
# Testing hypermutated samples only with log transformation
model.hyper.log10 = lm(log10(Non.silentMutationsperMb) ~ WGD + Type, data=mut.rates.hyper)
summary(model.hyper.log10)
##
## Call:
## lm(formula = log10(Non.silentMutationsperMb) ~ WGD + Type, data = mut.rates.hyper)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.5723 -0.4571 -0.0354 0.4282 1.7118
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.23593 0.67551 1.830 0.0692 .
## WGDWGD -0.01863 0.29306 -0.064 0.9494
## TypeACC -0.69754 0.74541 -0.936 0.3508
## TypePRAD 1.03871 0.90925 1.142 0.2550
## TypeLGG 1.27156 0.90925 1.398 0.1640
## TypeSARC -1.18738 0.90925 -1.306 0.1935
## TypePAAD 1.35100 0.90925 1.486 0.1393
## TypeBRCA -0.74880 0.67695 -1.106 0.2704
## TypeUCS 0.78120 0.90925 0.859 0.3916
## TypeGBM 0.77676 0.70954 1.095 0.2753
## TypeUCEC 0.27193 0.67672 0.402 0.6884
## TypeCESC 0.41080 0.90925 0.452 0.6520
## TypeESCA -0.56276 0.86073 -0.654 0.5142
## TypeREAD 0.90232 0.74089 1.218 0.2251
## TypeCOAD 0.25087 0.68229 0.368 0.7136
## TypeSTAD 0.71108 0.72829 0.976 0.3304
## TypeBLCA 0.21315 0.75968 0.281 0.7794
## TypeSKCM 0.29534 0.90925 0.325 0.7458
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6086 on 156 degrees of freedom
## Multiple R-squared: 0.2226, Adjusted R-squared: 0.1379
## F-statistic: 2.628 on 17 and 156 DF, p-value: 0.0008978
# Testing non-hypermutated samples only
model.nonhyper = lm(Non.silentMutationsperMb ~ WGD + Type, data=mut.rates.nonhyper)
summary(model.nonhyper)
##
## Call:
## lm(formula = Non.silentMutationsperMb ~ WGD + Type, data = mut.rates.nonhyper)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.819 -1.690 -0.458 0.102 271.100
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.27718 0.69589 0.398 0.690412
## WGDWGD -0.06941 0.20703 -0.335 0.737425
## TypeTHCA 0.01578 0.80620 0.020 0.984386
## TypeTHYM 0.16762 1.10131 0.152 0.879033
## TypeTGCT 0.11636 1.02059 0.114 0.909227
## TypeUVM 0.20819 1.19666 0.174 0.861892
## TypeLAML 1.07711 1.08623 0.992 0.321420
## TypeKICH 0.23932 1.29180 0.185 0.853026
## TypeACC 1.81514 1.16679 1.556 0.119820
## TypeMESO 0.48056 1.20155 0.400 0.689199
## TypeCHOL 0.84789 1.60966 0.527 0.598378
## TypePRAD 0.56660 0.80458 0.704 0.481313
## TypeLGG 0.56582 0.79753 0.709 0.478054
## TypeSARC 1.00938 0.90998 1.109 0.267359
## TypePAAD 0.74023 0.98814 0.749 0.453807
## TypeBRCA 1.76034 0.75202 2.341 0.019263 *
## TypeUCS 1.58325 1.37253 1.154 0.248725
## TypeGBM 1.70511 0.83033 2.054 0.040048 *
## TypeKIRC 1.20541 0.83894 1.437 0.150799
## TypeKIRP 1.18602 0.87264 1.359 0.174140
## TypeUCEC 6.53743 0.82614 7.913 2.80e-15 ***
## TypeCESC 3.60851 1.00953 3.574 0.000353 ***
## TypeLIHC 2.22301 0.85514 2.600 0.009349 **
## TypeOV 2.80091 0.82903 3.379 0.000732 ***
## TypeESCA 2.75142 0.98673 2.788 0.005307 **
## TypeREAD 2.71502 1.06367 2.552 0.010711 *
## TypeDLBC 2.81179 1.59191 1.766 0.077379 .
## TypeHNSC 3.61806 0.80021 4.521 6.22e-06 ***
## TypeCOAD 6.45664 0.84563 7.635 2.48e-14 ***
## TypeSTAD 8.11038 0.81709 9.926 < 2e-16 ***
## TypeBLCA 6.29669 0.82722 7.612 2.97e-14 ***
## TypeLUAD 7.61072 0.80294 9.479 < 2e-16 ***
## TypeLUSC 7.11400 0.80830 8.801 < 2e-16 ***
## TypeSKCM 14.57326 0.80788 18.039 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.71 on 9206 degrees of freedom
## Multiple R-squared: 0.1477, Adjusted R-squared: 0.1447
## F-statistic: 48.36 on 33 and 9206 DF, p-value: < 2.2e-16
# Testing non-hypermutated samples only with log transformation
model.nonhyper.log10 = lm(log10(Non.silentMutationsperMb) ~ WGD + Type, data=mut.rates.nonhyper)
summary(model.nonhyper.log10)
##
## Call:
## lm(formula = log10(Non.silentMutationsperMb) ~ WGD + Type, data = mut.rates.nonhyper)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.34097 -0.18836 -0.00993 0.16700 2.16255
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.660044 0.028925 -22.819 < 2e-16 ***
## WGDWGD 0.088940 0.008605 10.336 < 2e-16 ***
## TypeTHCA 0.024961 0.033510 0.745 0.456353
## TypeTHYM 0.032668 0.045776 0.714 0.475467
## TypeTGCT 0.013290 0.042421 0.313 0.754076
## TypeUVM 0.176109 0.049739 3.541 0.000401 ***
## TypeLAML 0.150303 0.045149 3.329 0.000875 ***
## TypeKICH 0.305224 0.053694 5.685 1.35e-08 ***
## TypeACC 0.508450 0.048498 10.484 < 2e-16 ***
## TypeMESO 0.445326 0.049942 8.917 < 2e-16 ***
## TypeCHOL 0.493403 0.066905 7.375 1.79e-13 ***
## TypePRAD 0.483156 0.033443 14.447 < 2e-16 ***
## TypeLGG 0.477800 0.033150 14.413 < 2e-16 ***
## TypeSARC 0.502457 0.037823 13.284 < 2e-16 ***
## TypePAAD 0.546158 0.041072 13.298 < 2e-16 ***
## TypeBRCA 0.664974 0.031258 21.274 < 2e-16 ***
## TypeUCS 0.736770 0.057049 12.915 < 2e-16 ***
## TypeGBM 0.825220 0.034513 23.911 < 2e-16 ***
## TypeKIRC 0.755893 0.034870 21.677 < 2e-16 ***
## TypeKIRP 0.761577 0.036271 20.997 < 2e-16 ***
## TypeUCEC 0.963528 0.034339 28.060 < 2e-16 ***
## TypeCESC 0.970240 0.041961 23.122 < 2e-16 ***
## TypeLIHC 0.913951 0.035544 25.713 < 2e-16 ***
## TypeOV 0.920658 0.034459 26.718 < 2e-16 ***
## TypeESCA 0.967312 0.041013 23.585 < 2e-16 ***
## TypeREAD 1.008680 0.044212 22.815 < 2e-16 ***
## TypeDLBC 1.057182 0.066168 15.977 < 2e-16 ***
## TypeHNSC 1.065420 0.033261 32.032 < 2e-16 ***
## TypeCOAD 1.175318 0.035149 33.438 < 2e-16 ***
## TypeSTAD 1.213602 0.033962 35.734 < 2e-16 ***
## TypeBLCA 1.276233 0.034384 37.118 < 2e-16 ***
## TypeLUAD 1.284325 0.033374 38.482 < 2e-16 ***
## TypeLUSC 1.388758 0.033597 41.336 < 2e-16 ***
## TypeSKCM 1.500010 0.033580 44.670 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.362 on 9206 degrees of freedom
## Multiple R-squared: 0.5766, Adjusted R-squared: 0.575
## F-statistic: 379.9 on 33 and 9206 DF, p-value: < 2.2e-16
Next, we calculate the differences between WGD and non-WGD samples within each tumor type using the non-parametric Wilcoxon rank-sum test.
res = c()
tumor.type = unique(unique(mut.rates.nonhyper$Type))
for(i in tumor.type) {
mut.rates.type = subset(mut.rates.nonhyper, Type == i)
model = wilcox.test(Non.silentMutationsperMb ~ WGD, data = mut.rates.type)
agg.median = aggregate(mut.rates.type$Non.silentMutationsperMb,
by=list(mut.rates.type$WGD),
median)
agg.mean = aggregate(mut.rates.type$Non.silentMutationsperMb,
by=list(mut.rates.type$WGD),
mean)
res = rbind(res, c(agg.median[,2], agg.mean[,2], model$p.value))
}
colnames(res) = c("Non-WGD_Median", "WGD_Median",
"Non-WGD_Mean", "WGD_Mean",
"Pvalue")
res = data.frame(Type = tumor.type,
res,
FDR=p.adjust(res[,"Pvalue"], "fdr"),
Bonferroni = p.adjust(res[,"Pvalue"], "bon"),
check.names = FALSE)
res = res[order(res$Pvalue),]
# Show results in table
kable(res, style = 'html', row.names = FALSE) %>%
kable_styling(bootstrap_options = "striped") %>%
scroll_box(width = "100%", height = "800px")
Type | Non-WGD_Median | WGD_Median | Non-WGD_Mean | WGD_Mean | Pvalue | FDR | Bonferroni |
---|---|---|---|---|---|---|---|
BRCA | 0.7887883 | 1.2911623 | 1.6425024 | 2.4745046 | 0.0000000 | 0.0000000 | 0.0000000 |
LUAD | 3.3057242 | 6.9627762 | 5.3857255 | 9.5580078 | 0.0000000 | 0.0000000 | 0.0000000 |
SARC | 0.5722591 | 0.9102237 | 0.8857385 | 1.6516875 | 0.0000000 | 0.0000000 | 0.0000000 |
BLCA | 3.1230637 | 5.7202472 | 4.9002245 | 7.5739151 | 0.0000000 | 0.0000000 | 0.0000000 |
PRAD | 0.6903266 | 0.9494249 | 0.8251362 | 0.9890484 | 0.0000003 | 0.0000023 | 0.0000114 |
LUSC | 5.6169853 | 6.7154968 | 6.2240072 | 8.2582152 | 0.0000309 | 0.0001700 | 0.0010199 |
LIHC | 1.8065517 | 2.3281210 | 2.3667956 | 2.6778088 | 0.0003893 | 0.0018351 | 0.0128459 |
THYM | 0.2360255 | 0.4123456 | 0.4194798 | 0.5894340 | 0.0014128 | 0.0058279 | 0.0466235 |
COAD | 2.9237316 | 2.6972247 | 9.3617234 | 3.1911594 | 0.0053887 | 0.0176937 | 0.1778274 |
ACC | 0.5187245 | 0.7906970 | 1.7644122 | 2.3151797 | 0.0055410 | 0.0176937 | 0.1828544 |
DLBC | 2.3135876 | 4.0776043 | 2.6005717 | 4.5390239 | 0.0058979 | 0.0176937 | 0.1946312 |
LGG | 0.7095434 | 0.7982604 | 0.7667416 | 0.9274871 | 0.0167829 | 0.0440212 | 0.5538356 |
OV | 1.9435882 | 2.1925929 | 3.6942358 | 2.5163338 | 0.0173417 | 0.0440212 | 0.5722760 |
KIRC | 1.3492120 | 1.5275596 | 1.4253739 | 1.7204570 | 0.0380972 | 0.0898006 | 1.0000000 |
ESCA | 1.9362814 | 2.3858724 | 3.2732779 | 2.7673311 | 0.0428929 | 0.0943644 | 1.0000000 |
CHOL | 0.6594951 | 0.9153742 | 1.1797529 | 0.8642641 | 0.0486301 | 0.1002995 | 1.0000000 |
PAAD | 0.8976690 | 1.0303383 | 0.9533860 | 1.1304550 | 0.0556280 | 0.1079837 | 1.0000000 |
SKCM | 8.2243260 | 9.0626483 | 13.1679982 | 17.2816952 | 0.1007054 | 0.1846265 | 1.0000000 |
HNSC | 2.6010606 | 2.8967389 | 3.9823461 | 3.6935965 | 0.1178907 | 0.2047576 | 1.0000000 |
THCA | 0.2364548 | 0.3012428 | 0.2901206 | 0.3287041 | 0.1261393 | 0.2081298 | 1.0000000 |
PCPG | 0.2258893 | 0.2489308 | 0.2569747 | 0.3197262 | 0.1480959 | 0.2294626 | 1.0000000 |
KICH | 0.4203093 | 0.5863136 | 0.4770349 | 0.6882803 | 0.1529751 | 0.2294626 | 1.0000000 |
KIRP | 1.4209476 | 1.7404218 | 1.4333290 | 1.8170041 | 0.1726414 | 0.2477028 | 1.0000000 |
UCEC | 1.3706469 | 1.3505228 | 8.5867437 | 2.0701521 | 0.2183405 | 0.3002182 | 1.0000000 |
UVM | 0.3215546 | 0.3881548 | 0.4841662 | 0.4339046 | 0.2489854 | 0.3286607 | 1.0000000 |
MESO | 0.6233613 | 0.7900537 | 0.6622938 | 1.0955767 | 0.2686183 | 0.3409386 | 1.0000000 |
GBM | 1.3522352 | 1.4722469 | 2.0098155 | 1.7378375 | 0.3199001 | 0.3909891 | 1.0000000 |
STAD | 2.7489453 | 3.4362930 | 10.9640890 | 4.6330983 | 0.4673039 | 0.5480047 | 1.0000000 |
CESC | 1.7448954 | 2.3039068 | 4.3870496 | 2.8647043 | 0.4815799 | 0.5480047 | 1.0000000 |
READ | 2.3601104 | 2.5257189 | 3.2471526 | 2.6678267 | 0.5904206 | 0.6494627 | 1.0000000 |
TGCT | 0.2769972 | 0.2783775 | 0.3482366 | 0.3261160 | 0.6908412 | 0.7354116 | 1.0000000 |
UCS | 1.3342429 | 1.3009094 | 3.5672642 | 1.5004883 | 0.9160979 | 0.9447260 | 1.0000000 |
LAML | 0.3387778 | 0.3521584 | 1.3907336 | 0.3280533 | 0.9550171 | 0.9550171 | 1.0000000 |
We can plot the mutation counts per Mb for each tumor type divided into WGD and non-WGD.
ggplot(mut.rates.nonhyper, aes(x=Type, y=Non.silentMutationsperMb, fill=WGD)) +
geom_violin() +
theme_bw() +
scale_y_log10() +
ylab("Non-silent mutations per Mb (log10)") +
xlab("Tumor Type") +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank())
We can also plot each tumor type individually to better see the differences between WGD and non-WGD tumors.
ggplot(mut.rates.nonhyper, aes(x=WGD, y=Non.silentMutationsperMb, fill=WGD)) +
geom_violin() +
geom_jitter(size = 0.11, color="grey") +
facet_wrap(~ Type) +
theme_bw() +
scale_y_log10() +
ylab("Non-silent mutations per Mb (log10)") +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank())
We perform the same analysis as in the previous section, but add Ploidy as a covariate to correct for overall DNA content regardless of whether the increase in ploidy was due to WGD or not.
# Controlling for tumor type and Hypermutation
model.full.ploidy = lm(Non.silentMutationsperMb ~ WGD + Ploidy + MSI_POLE_Status + Type, data=mut.rates)
summary(model.full.ploidy)
##
## Call:
## lm(formula = Non.silentMutationsperMb ~ WGD + Ploidy + MSI_POLE_Status +
## Type, data = mut.rates)
##
## Residuals:
## Min 1Q Median 3Q Max
## -81.80 -1.89 -0.61 0.09 650.47
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.469920 1.631387 -0.288 0.773315
## WGDWGD -0.642181 0.782274 -0.821 0.411715
## Ploidy 0.383958 0.455550 0.843 0.399338
## MSI_POLE_StatusYes 76.151159 1.437217 52.985 < 2e-16 ***
## TypeTHCA -0.002643 1.585575 -0.002 0.998670
## TypeTHYM 0.152891 2.165859 0.071 0.943724
## TypeTGCT 0.265510 2.013934 0.132 0.895116
## TypeUVM 0.183187 2.353480 0.078 0.937960
## TypeLAML 1.044090 2.136476 0.489 0.625068
## TypeKICH -0.564078 2.530682 -0.223 0.823622
## TypeACC 0.295910 2.282937 0.130 0.896871
## TypeMESO 0.535937 2.363814 0.227 0.820642
## TypeCHOL 0.866579 3.165562 0.274 0.784281
## TypePRAD 0.788142 1.581964 0.498 0.618352
## TypeLGG 0.982236 1.570388 0.625 0.531676
## TypeSARC 0.713449 1.788533 0.399 0.689975
## TypePAAD 2.759837 1.940301 1.422 0.154950
## TypeBRCA 1.503869 1.478571 1.017 0.309127
## TypeUCS 2.018409 2.681349 0.753 0.451614
## TypeGBM 2.508383 1.631180 1.538 0.124138
## TypeKIRC 1.193586 1.649878 0.723 0.469428
## TypeKIRP 1.134958 1.717130 0.661 0.508653
## TypeUCEC 6.374479 1.604483 3.973 7.15e-05 ***
## TypeCESC 3.359699 1.981659 1.695 0.090033 .
## TypeLIHC 2.144415 1.684381 1.273 0.203008
## TypeOV 2.826552 1.630488 1.734 0.083029 .
## TypeESCA 2.344851 1.938932 1.209 0.226558
## TypeREAD 6.405565 2.072519 3.091 0.002003 **
## TypeDLBC 2.711858 3.132871 0.866 0.386724
## TypeHNSC 3.614407 1.573660 2.297 0.021651 *
## TypeCOAD 4.120510 1.648605 2.499 0.012458 *
## TypeSTAD 8.427906 1.604287 5.253 1.53e-07 ***
## TypeBLCA 6.122761 1.626188 3.765 0.000167 ***
## TypeLUAD 7.620255 1.578989 4.826 1.41e-06 ***
## TypeLUSC 7.155740 1.590157 4.500 6.88e-06 ***
## TypeSKCM 14.441269 1.588313 9.092 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.13 on 9378 degrees of freedom
## Multiple R-squared: 0.2972, Adjusted R-squared: 0.2946
## F-statistic: 113.3 on 35 and 9378 DF, p-value: < 2.2e-16
# Controlling for tumor type and Hypermutation with log transformation
model.full.log10.ploidy = lm(log10(Non.silentMutationsperMb) ~ WGD + Ploidy + MSI_POLE_Status + Type, data=mut.rates)
summary(model.full.log10.ploidy)
##
## Call:
## lm(formula = log10(Non.silentMutationsperMb) ~ WGD + Ploidy +
## MSI_POLE_Status + Type, data = mut.rates)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.34072 -0.19100 -0.01054 0.16822 2.16241
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.684707 0.035197 -19.454 < 2e-16 ***
## WGDWGD 0.068997 0.016877 4.088 4.38e-05 ***
## Ploidy 0.012748 0.009828 1.297 0.194645
## MSI_POLE_StatusYes 1.181586 0.031008 38.106 < 2e-16 ***
## TypeTHCA 0.024233 0.034208 0.708 0.478726
## TypeTHYM 0.032135 0.046728 0.688 0.491654
## TypeTGCT 0.018987 0.043450 0.437 0.662134
## TypeUVM 0.175195 0.050776 3.450 0.000562 ***
## TypeLAML 0.149099 0.046094 3.235 0.001222 **
## TypeKICH 0.313892 0.054599 5.749 9.25e-09 ***
## TypeACC 0.499377 0.049254 10.139 < 2e-16 ***
## TypeMESO 0.447199 0.050999 8.769 < 2e-16 ***
## TypeCHOL 0.494088 0.068296 7.235 5.04e-13 ***
## TypePRAD 0.485222 0.034130 14.217 < 2e-16 ***
## TypeLGG 0.478608 0.033881 14.126 < 2e-16 ***
## TypeSARC 0.499890 0.038587 12.955 < 2e-16 ***
## TypePAAD 0.556893 0.041861 13.303 < 2e-16 ***
## TypeBRCA 0.660781 0.031900 20.714 < 2e-16 ***
## TypeUCS 0.749983 0.057849 12.964 < 2e-16 ***
## TypeGBM 0.829229 0.035192 23.563 < 2e-16 ***
## TypeKIRC 0.755505 0.035596 21.225 < 2e-16 ***
## TypeKIRP 0.759801 0.037047 20.509 < 2e-16 ***
## TypeUCEC 0.968166 0.034616 27.969 < 2e-16 ***
## TypeCESC 0.971528 0.042754 22.724 < 2e-16 ***
## TypeLIHC 0.911525 0.036340 25.083 < 2e-16 ***
## TypeOV 0.921883 0.035177 26.207 < 2e-16 ***
## TypeESCA 0.964025 0.041832 23.045 < 2e-16 ***
## TypeREAD 1.028622 0.044714 23.004 < 2e-16 ***
## TypeDLBC 1.053947 0.067591 15.593 < 2e-16 ***
## TypeHNSC 1.065525 0.033951 31.384 < 2e-16 ***
## TypeCOAD 1.161831 0.035568 32.665 < 2e-16 ***
## TypeSTAD 1.216458 0.034612 35.146 < 2e-16 ***
## TypeBLCA 1.273557 0.035085 36.300 < 2e-16 ***
## TypeLUAD 1.285047 0.034066 37.722 < 2e-16 ***
## TypeLUSC 1.390516 0.034307 40.531 < 2e-16 ***
## TypeSKCM 1.498934 0.034267 43.742 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3696 on 9378 degrees of freedom
## Multiple R-squared: 0.6018, Adjusted R-squared: 0.6003
## F-statistic: 404.9 on 35 and 9378 DF, p-value: < 2.2e-16
# Testing hypermutated samples only
model.hyper.ploidy = lm(Non.silentMutationsperMb ~ WGD + Ploidy + Type, data=mut.rates.hyper)
summary(model.hyper.ploidy)
##
## Call:
## lm(formula = Non.silentMutationsperMb ~ WGD + Ploidy + Type,
## data = mut.rates.hyper)
##
## Residuals:
## Min 1Q Median 3Q Max
## -162.45 -70.10 -37.90 33.96 652.84
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -16.228 138.642 -0.117 0.9070
## WGDWGD 2.556 107.211 0.024 0.9810
## Ploidy 10.585 59.098 0.179 0.8581
## TypeACC -11.490 129.346 -0.089 0.9293
## TypePRAD 183.586 163.364 1.124 0.2628
## TypeLGG 316.043 164.996 1.915 0.0573 .
## TypeSARC -2.446 161.932 -0.015 0.9880
## TypePAAD 380.407 165.346 2.301 0.0227 *
## TypeBRCA 14.317 126.594 0.113 0.9101
## TypeUCS 99.081 163.833 0.605 0.5462
## TypeGBM 163.931 128.216 1.279 0.2030
## TypeUCEC 77.043 125.923 0.612 0.5415
## TypeCESC 39.393 163.833 0.240 0.8103
## TypeESCA -11.556 149.371 -0.077 0.9384
## TypeREAD 186.775 136.415 1.369 0.1729
## TypeCOAD 39.198 127.242 0.308 0.7584
## TypeSTAD 105.380 135.366 0.778 0.4375
## TypeBLCA 42.056 135.675 0.310 0.7570
## TypeSKCM 28.831 164.155 0.176 0.8608
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 105.6 on 155 degrees of freedom
## Multiple R-squared: 0.1568, Adjusted R-squared: 0.05888
## F-statistic: 1.601 on 18 and 155 DF, p-value: 0.0657
# Testing hypermutated samples only with log transformation
model.hyper.log10.ploidy = lm(log10(Non.silentMutationsperMb) ~ WGD + Ploidy + Type, data=mut.rates.hyper)
summary(model.hyper.log10.ploidy)
##
## Call:
## lm(formula = log10(Non.silentMutationsperMb) ~ WGD + Ploidy +
## Type, data = mut.rates.hyper)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.43820 -0.45347 -0.03138 0.42339 1.66186
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.87026 0.79969 1.088 0.278
## WGDWGD -0.48474 0.61840 -0.784 0.434
## Ploidy 0.29185 0.34088 0.856 0.393
## TypeACC -0.70191 0.74607 -0.941 0.348
## TypePRAD 0.82943 0.94229 0.880 0.380
## TypeLGG 1.03309 0.95170 1.086 0.279
## TypeSARC -1.36748 0.93403 -1.464 0.145
## TypePAAD 1.10669 0.95372 1.160 0.248
## TypeBRCA -0.98195 0.73020 -1.345 0.181
## TypeUCS 0.56316 0.94499 0.596 0.552
## TypeGBM 0.59998 0.73955 0.811 0.418
## TypeUCEC 0.04733 0.72633 0.065 0.948
## TypeCESC 0.19277 0.94499 0.204 0.839
## TypeESCA -0.55109 0.86158 -0.640 0.523
## TypeREAD 0.67699 0.78685 0.860 0.391
## TypeCOAD 0.02059 0.73394 0.028 0.978
## TypeSTAD 0.47144 0.78080 0.604 0.547
## TypeBLCA 0.05452 0.78258 0.070 0.945
## TypeSKCM 0.07147 0.94685 0.075 0.940
##
## Residual standard error: 0.6091 on 155 degrees of freedom
## Multiple R-squared: 0.2263, Adjusted R-squared: 0.1364
## F-statistic: 2.519 on 18 and 155 DF, p-value: 0.001197
# Testing non-hypermutated samples only
model.nonhyper.ploidy = lm(Non.silentMutationsperMb ~ WGD + Ploidy + Type, data=mut.rates.nonhyper)
summary(model.nonhyper.ploidy)
##
## Call:
## lm(formula = Non.silentMutationsperMb ~ WGD + Ploidy + Type,
## data = mut.rates.nonhyper)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.827 -1.704 -0.453 0.110 271.101
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.325885 0.829876 -0.393 0.694557
## WGDWGD -0.523679 0.398596 -1.314 0.188943
## Ploidy 0.309365 0.231969 1.334 0.182352
## TypeTHCA 0.001851 0.806237 0.002 0.998168
## TypeTHYM 0.156093 1.101298 0.142 0.887292
## TypeTGCT 0.230715 1.024146 0.225 0.821770
## TypeUVM 0.188696 1.196700 0.158 0.874712
## TypeLAML 1.051344 1.086359 0.968 0.333186
## TypeKICH 0.335828 1.293773 0.260 0.795199
## TypeACC 1.908486 1.168837 1.633 0.102544
## TypeMESO 0.524912 1.201955 0.437 0.662328
## TypeCHOL 0.862447 1.609627 0.536 0.592106
## TypePRAD 0.552493 0.804619 0.687 0.492319
## TypeLGG 0.507210 0.798711 0.635 0.525421
## TypeSARC 1.043155 0.910297 1.146 0.251845
## TypePAAD 0.760189 0.988210 0.769 0.441759
## TypeBRCA 1.746050 0.752064 2.322 0.020272 *
## TypeUCS 1.554441 1.372642 1.132 0.257477
## TypeGBM 1.685991 0.830415 2.030 0.042355 *
## TypeKIRC 1.195854 0.838931 1.425 0.154061
## TypeKIRP 1.145507 0.873130 1.312 0.189568
## TypeUCEC 6.521257 0.826196 7.893 3.28e-15 ***
## TypeCESC 3.607939 1.009485 3.574 0.000353 ***
## TypeLIHC 2.158254 0.856484 2.520 0.011756 *
## TypeOV 2.818657 0.829099 3.400 0.000678 ***
## TypeESCA 2.800397 0.987372 2.836 0.004575 **
## TypeREAD 2.705811 1.063650 2.544 0.010979 *
## TypeDLBC 2.730619 1.593008 1.714 0.086539 .
## TypeHNSC 3.613351 0.800186 4.516 6.39e-06 ***
## TypeCOAD 6.447307 0.845626 7.624 2.70e-14 ***
## TypeSTAD 8.111851 0.817055 9.928 < 2e-16 ***
## TypeBLCA 6.267093 0.827485 7.574 3.98e-14 ***
## TypeLUAD 7.615243 0.802917 9.484 < 2e-16 ***
## TypeLUSC 7.144725 0.808593 8.836 < 2e-16 ***
## TypeSKCM 14.565894 0.807866 18.030 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.71 on 9205 degrees of freedom
## Multiple R-squared: 0.1479, Adjusted R-squared: 0.1448
## F-statistic: 46.99 on 34 and 9205 DF, p-value: < 2.2e-16
# Testing non-hypermutated samples only with log transformation
model.nonhyper.log10.ploidy = lm(log10(Non.silentMutationsperMb) ~ WGD + Ploidy + Type, data=mut.rates.nonhyper)
summary(model.nonhyper.log10.ploidy)
##
## Call:
## lm(formula = log10(Non.silentMutationsperMb) ~ WGD + Ploidy +
## Type, data = mut.rates.nonhyper)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.34129 -0.18715 -0.00981 0.16656 2.16306
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.683436 0.034494 -19.813 < 2e-16 ***
## WGDWGD 0.071319 0.016568 4.305 1.69e-05 ***
## Ploidy 0.012000 0.009642 1.245 0.213319
## TypeTHCA 0.024421 0.033512 0.729 0.466182
## TypeTHYM 0.032221 0.045776 0.704 0.481530
## TypeTGCT 0.017725 0.042569 0.416 0.677137
## TypeUVM 0.175353 0.049741 3.525 0.000425 ***
## TypeLAML 0.149303 0.045155 3.306 0.000948 ***
## TypeKICH 0.308967 0.053776 5.745 9.46e-09 ***
## TypeACC 0.512071 0.048583 10.540 < 2e-16 ***
## TypeMESO 0.447047 0.049960 8.948 < 2e-16 ***
## TypeCHOL 0.493968 0.066905 7.383 1.68e-13 ***
## TypePRAD 0.482608 0.033444 14.430 < 2e-16 ***
## TypeLGG 0.475526 0.033199 14.324 < 2e-16 ***
## TypeSARC 0.503767 0.037837 13.314 < 2e-16 ***
## TypePAAD 0.546932 0.041075 13.315 < 2e-16 ***
## TypeBRCA 0.664420 0.031260 21.255 < 2e-16 ***
## TypeUCS 0.735653 0.057055 12.894 < 2e-16 ***
## TypeGBM 0.824479 0.034517 23.886 < 2e-16 ***
## TypeKIRC 0.755523 0.034871 21.666 < 2e-16 ***
## TypeKIRP 0.760006 0.036292 20.941 < 2e-16 ***
## TypeUCEC 0.962901 0.034341 28.039 < 2e-16 ***
## TypeCESC 0.970218 0.041960 23.123 < 2e-16 ***
## TypeLIHC 0.911439 0.035600 25.602 < 2e-16 ***
## TypeOV 0.921347 0.034462 26.735 < 2e-16 ***
## TypeESCA 0.969212 0.041041 23.616 < 2e-16 ***
## TypeREAD 1.008323 0.044211 22.807 < 2e-16 ***
## TypeDLBC 1.054033 0.066214 15.919 < 2e-16 ***
## TypeHNSC 1.065237 0.033260 32.027 < 2e-16 ***
## TypeCOAD 1.174956 0.035149 33.428 < 2e-16 ***
## TypeSTAD 1.213659 0.033961 35.737 < 2e-16 ***
## TypeBLCA 1.275086 0.034395 37.072 < 2e-16 ***
## TypeLUAD 1.284501 0.033374 38.488 < 2e-16 ***
## TypeLUSC 1.389950 0.033610 41.356 < 2e-16 ***
## TypeSKCM 1.499725 0.033579 44.662 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.362 on 9205 degrees of freedom
## Multiple R-squared: 0.5766, Adjusted R-squared: 0.5751
## F-statistic: 368.8 on 34 and 9205 DF, p-value: < 2.2e-16
In the last model examining non-hypermutated samples, we can see that mutation burden is still associated with WGD even after correcting for overall ploidy. Note that although the p-value is signficant the overall effect size is still relatively low, 0.07 in log10 scale is ~1.17, so WGD seems to increase mutation burden by about 1 per MB independent of DNA-content.
Next, we calculate the differences between WGD and non-WGD samples within each tumor type using the non-parametric Wilcoxon rank-sum test. In contrast to previous section, we divided the log10 mutation burden by the tumor’s ploidy to correction for overall DNA content.
res.ploidy = c()
tumor.type = unique(unique(mut.rates.nonhyper$Type))
for(i in tumor.type) {
mut.rates.type = subset(mut.rates.nonhyper, Type == i)
model = wilcox.test(PloidyCorrected_MutationPerMb ~ WGD, data = mut.rates.type)
agg.median = aggregate(mut.rates.type$PloidyCorrected_MutationPerMb,
by=list(mut.rates.type$WGD),
median)
agg.mean = aggregate(mut.rates.type$PloidyCorrected_MutationPerMb,
by=list(mut.rates.type$WGD),
mean)
res.ploidy = rbind(res.ploidy, c(agg.median[,2], agg.mean[,2], model$p.value))
}
colnames(res.ploidy) = c("Non-WGD_Median", "WGD_Median",
"Non-WGD_Mean", "WGD_Mean",
"Pvalue")
res.ploidy = data.frame(Type = tumor.type,
res.ploidy,
FDR=p.adjust(res.ploidy[,"Pvalue"], "fdr"),
Bonferroni = p.adjust(res.ploidy[,"Pvalue"], "bon"),
check.names = FALSE)
res.ploidy = res.ploidy[order(res.ploidy$Pvalue),]
# Show results in table
kable(res.ploidy, style = 'html', row.names = FALSE) %>%
kable_styling(bootstrap_options = "striped") %>%
scroll_box(width = "100%", height = "800px")
Type | Non-WGD_Median | WGD_Median | Non-WGD_Mean | WGD_Mean | Pvalue | FDR | Bonferroni |
---|---|---|---|---|---|---|---|
LUSC | 0.4033557 | 0.2512009 | 0.3823527 | 0.2590251 | 0.0000000 | 0.0000000 | 0.0000000 |
COAD | 0.2278502 | 0.1253376 | 0.3119621 | 0.1344684 | 0.0000000 | 0.0000000 | 0.0000000 |
BRCA | -0.0491793 | 0.0328883 | -0.0151825 | 0.0442232 | 0.0000000 | 0.0000000 | 0.0000000 |
SARC | -0.1253635 | -0.0125760 | -0.1143936 | -0.0008105 | 0.0000000 | 0.0000000 | 0.0000000 |
SKCM | 0.4664244 | 0.2824708 | 0.4241903 | 0.2808873 | 0.0000000 | 0.0000000 | 0.0000000 |
PRAD | -0.0814821 | -0.0063491 | -0.0923605 | -0.0063781 | 0.0000000 | 0.0000000 | 0.0000000 |
PCPG | -0.3242914 | -0.1716240 | -0.3429651 | -0.1619262 | 0.0000000 | 0.0000000 | 0.0000001 |
LGG | -0.0759703 | -0.0267069 | -0.0901267 | -0.0266994 | 0.0000000 | 0.0000000 | 0.0000001 |
HNSC | 0.2088784 | 0.1415845 | 0.2154824 | 0.1406598 | 0.0000000 | 0.0000001 | 0.0000012 |
READ | 0.1748569 | 0.1182970 | 0.1880058 | 0.1220811 | 0.0000004 | 0.0000013 | 0.0000128 |
OV | 0.1516856 | 0.1001366 | 0.1488547 | 0.1027001 | 0.0000037 | 0.0000112 | 0.0001227 |
STAD | 0.2216100 | 0.1548190 | 0.3027677 | 0.1716448 | 0.0000095 | 0.0000261 | 0.0003134 |
THYM | -0.3102500 | -0.0839461 | -0.3282617 | -0.1006516 | 0.0000205 | 0.0000520 | 0.0006760 |
THCA | -0.3121091 | -0.1804234 | -0.3174176 | -0.1702747 | 0.0000802 | 0.0001889 | 0.0026452 |
ACC | -0.1519286 | -0.0295340 | -0.1383139 | -0.0155641 | 0.0001409 | 0.0003099 | 0.0046488 |
KICH | -0.2214557 | -0.0743173 | -0.2181215 | -0.0730882 | 0.0008347 | 0.0017217 | 0.0275467 |
UVM | -0.2320786 | -0.1223200 | -0.2385757 | -0.1223627 | 0.0042076 | 0.0081676 | 0.1388492 |
CHOL | -0.0930938 | -0.0123117 | -0.0897872 | -0.0220278 | 0.0055013 | 0.0100856 | 0.1815416 |
LIHC | 0.1268829 | 0.0932007 | 0.1188285 | 0.0970984 | 0.0062439 | 0.0108447 | 0.2060484 |
UCEC | 0.0695230 | 0.0371875 | 0.1857847 | 0.0582634 | 0.0088016 | 0.0145226 | 0.2904519 |
BLCA | 0.2480573 | 0.2168641 | 0.2676753 | 0.2218998 | 0.0244806 | 0.0384695 | 0.8078601 |
PAAD | -0.0245814 | 0.0034962 | -0.0606734 | -0.0039827 | 0.0276393 | 0.0404548 | 0.9120957 |
GBM | 0.0654117 | 0.0423195 | 0.0893722 | 0.0481101 | 0.0281958 | 0.0404548 | 0.9304610 |
ESCA | 0.1561272 | 0.1264551 | 0.1614070 | 0.1288075 | 0.0358150 | 0.0492456 | 1.0000000 |
MESO | -0.1055617 | -0.0286493 | -0.1153821 | -0.0398067 | 0.0461794 | 0.0609568 | 1.0000000 |
LAML | -0.2365280 | -0.1153739 | -0.2523608 | -0.1261346 | 0.1446919 | 0.1836474 | 1.0000000 |
LUAD | 0.2689551 | 0.2465316 | 0.2703211 | 0.2438727 | 0.1840052 | 0.2248952 | 1.0000000 |
TGCT | -0.2812626 | -0.1882902 | -0.3355509 | -0.1894330 | 0.3067936 | 0.3615781 | 1.0000000 |
CESC | 0.1258709 | 0.1200428 | 0.1665097 | 0.1069306 | 0.3278749 | 0.3730990 | 1.0000000 |
KIRC | 0.0663805 | 0.0557844 | 0.0491416 | 0.0508983 | 0.3652635 | 0.4017898 | 1.0000000 |
UCS | 0.0557397 | 0.0317711 | 0.1196058 | 0.0390536 | 0.4450818 | 0.4737968 | 1.0000000 |
DLBC | 0.1791250 | 0.1690399 | 0.1731282 | 0.1549902 | 0.5191938 | 0.5354186 | 1.0000000 |
KIRP | 0.0761933 | 0.0790934 | 0.0441408 | 0.0534016 | 0.8012356 | 0.8012356 | 1.0000000 |
We can plot the mutation counts per Mb divided by ploidy for each tumor type divided into WGD and non-WGD.
ggplot(mut.rates.nonhyper, aes(x=Type, y=PloidyCorrected_MutationPerMb, fill=WGD)) +
geom_violin() +
theme_bw() +
ylab("Non-silent mutations per Mb (log10) / Ploidy") +
xlab("Tumor Type") +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank())
We can also plot each tumor type individually to better see the differences between Ploidy-corrected mutations burden and WGD/non-WGD tumors.
ggplot(mut.rates.nonhyper, aes(x=WGD, y=PloidyCorrected_MutationPerMb, fill=WGD)) +
geom_violin() +
geom_jitter(size = 0.11, color="grey") +
facet_wrap(~ Type) +
theme_bw() +
ylab("Non-silent mutations per Mb (log10) / Ploidy") +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank())
muts.nonhyper = muts[,rownames(mut.rates.nonhyper)]
muts.nonhyper.min10 = muts.nonhyper[rowSums(muts.nonhyper) > 10,]
res.gene.lr = logistic.regression(muts.nonhyper.min10, ~ WGD + Log10_MutsPerMB + Type, covar = mut.rates.nonhyper, verbose = FALSE)
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
res.gene.lr.summary = data.frame(Gene = rownames(res.gene.lr$Estimate), WGD_Estimate=res.gene.lr$Estimate[,"WGDWGD"],
WGD_Zvalue = res.gene.lr$Zvalue[,"WGDWGD"],
WGD_Pvalue = res.gene.lr$Pvalue[,"WGDWGD"],
WGD_FDR = res.gene.lr$FDR[,"WGDWGD"])
res.gene.lr.summary = res.gene.lr.summary[order(res.gene.lr.summary$WGD_Pvalue),]
write.table(res.gene.lr.summary, "LR_WGD_Results_Nonhyper.txt", sep="\t", quote=FALSE, row.names=FALSE)
label = rownames(res.gene.lr.summary)
label[res.gene.lr.summary$WGD_FDR > 0.05] = ""
# Plot with all genes
g1 = ggplot(res.gene.lr.summary, aes(x=WGD_Estimate,
y=-log10(WGD_Pvalue),
color = WGD_FDR < 0.05)) +
geom_point() +
theme_bw() + xlab("Log odds ratio") + ylab("P-value (-log10)") +
theme(panel.grid = element_blank()) +
geom_vline(xintercept = 0, lty=2, color="grey") +
geom_text_repel(label = label) +
ggtitle("Plotting all genes")
# Plot excluding TP53
g2 = ggplot(res.gene.lr.summary[-1,], aes(x=WGD_Estimate,
y=-log10(WGD_Pvalue),
color = WGD_FDR < 0.05)) +
geom_point() +
theme_bw() + xlab("Log odds ratio") + ylab("P-value (-log10)") +
theme(panel.grid = element_blank()) +
geom_vline(xintercept = 0, lty=2, color="grey") +
geom_text_repel(label = label[-1]) +
ggtitle("Excluding TP53 from plot")
plot_grid(g1, g2, labels = "AUTO")
# Show table of results
res.gene.lr.summary.sig = subset(res.gene.lr.summary, WGD_FDR < 0.25)
kable(res.gene.lr.summary.sig, style = 'html', row.names = FALSE) %>%
kable_styling(bootstrap_options = "striped") %>%
scroll_box(width = "100%", height = "800px")
Gene | WGD_Estimate | WGD_Zvalue | WGD_Pvalue | WGD_FDR |
---|---|---|---|---|
TP53 | 1.0794584 | 18.785698 | 0.0000000 | 0.0000000 |
PTEN | -0.7672387 | -7.011003 | 0.0000000 | 0.0000000 |
PIK3CA | -0.4786690 | -6.384232 | 0.0000000 | 0.0000000 |
ARID1A | -0.5768852 | -5.660534 | 0.0000000 | 0.0000024 |
FGFR3 | -1.0398106 | -4.826179 | 0.0000014 | 0.0001754 |
CTNNB1 | -0.6158286 | -4.438519 | 0.0000091 | 0.0009511 |
PPP2R1A | 0.8456584 | 4.361786 | 0.0000129 | 0.0011610 |
KMT2B | -0.6214181 | -4.241948 | 0.0000222 | 0.0017450 |
CHD1 | -0.8968124 | -4.063260 | 0.0000484 | 0.0033874 |
HRAS | -0.9992193 | -3.883845 | 0.0001028 | 0.0060679 |
AKT1 | -1.1463076 | -3.859282 | 0.0001137 | 0.0060679 |
SMAD4 | -0.6244750 | -3.855318 | 0.0001156 | 0.0060679 |
KMT2C | -0.3408770 | -3.763359 | 0.0001676 | 0.0081244 |
CTCF | -0.7029485 | -3.649443 | 0.0002628 | 0.0118264 |
RNF43 | -0.6779476 | -3.335752 | 0.0008507 | 0.0357290 |
JAK1 | -0.7668422 | -3.297457 | 0.0009756 | 0.0384161 |
CDH1 | -0.4449398 | -3.058587 | 0.0022238 | 0.0824127 |
FAS | -1.3759040 | -2.933289 | 0.0033539 | 0.1079772 |
BCOR | -0.4626777 | -2.929062 | 0.0033999 | 0.1079772 |
HLA-A | -0.8536875 | -2.900500 | 0.0037257 | 0.1079772 |
EIF1AX | -1.7996105 | -2.881124 | 0.0039626 | 0.1079772 |
BRAF | -0.3709148 | -2.877323 | 0.0040107 | 0.1079772 |
C1orf168 | -0.6412355 | -2.874089 | 0.0040519 | 0.1079772 |
TGFBR2 | -0.6098609 | -2.869330 | 0.0041134 | 0.1079772 |
MAP2K4 | -0.5989836 | -2.811855 | 0.0049257 | 0.1241270 |
CHD8 | -0.4413577 | -2.784141 | 0.0053670 | 0.1300458 |
CREBBP | -0.3499356 | -2.754055 | 0.0058862 | 0.1373444 |
ZEB2 | 0.4240420 | 2.734583 | 0.0062459 | 0.1384352 |
RBM10 | -0.5650726 | -2.727977 | 0.0063724 | 0.1384352 |
KANSL1 | -0.4999477 | -2.676508 | 0.0074394 | 0.1496449 |
BTBD11 | -0.5402268 | -2.673664 | 0.0075028 | 0.1496449 |
MPP7 | -0.6284041 | -2.662284 | 0.0077612 | 0.1496449 |
MAP3K1 | -0.4268961 | -2.658946 | 0.0078385 | 0.1496449 |
RHOA | -0.6398340 | -2.646768 | 0.0081265 | 0.1505796 |
FOXA2 | 0.6912303 | 2.569411 | 0.0101872 | 0.1833690 |
BCORL1 | -0.4154184 | -2.559519 | 0.0104817 | 0.1834301 |
MAP7D3 | -0.6591130 | -2.523561 | 0.0116173 | 0.1959742 |
PIK3R1 | -0.3827401 | -2.517453 | 0.0118207 | 0.1959742 |
CCAR1 | -0.5882731 | -2.508280 | 0.0121321 | 0.1959794 |
NF1 | -0.2652156 | -2.487090 | 0.0128793 | 0.2011616 |
ADNP2 | -0.5774797 | -2.469646 | 0.0135247 | 0.2011616 |
ZMYM2 | -0.5655137 | -2.464682 | 0.0137135 | 0.2011616 |
RPL22 | -0.8136788 | -2.464249 | 0.0137301 | 0.2011616 |
SPTA1 | 0.2170192 | 2.440063 | 0.0146847 | 0.2084574 |
LZTR1 | -0.5395786 | -2.432198 | 0.0150075 | 0.2084574 |
SYNE1 | 0.1922635 | 2.423399 | 0.0153760 | 0.2084574 |
KMT2D | -0.2205041 | -2.419273 | 0.0155516 | 0.2084574 |
CIC | -0.3549505 | -2.406418 | 0.0161098 | 0.2114413 |
SPTAN1 | -0.3737495 | -2.381084 | 0.0172618 | 0.2219374 |
tumor.non.tp53 = colnames(muts.nonhyper)[muts.nonhyper["TP53",] == 0]
i = intersect(intersect(rownames(mut.rates.nonhyper), colnames(muts.nonhyper)), tumor.non.tp53)
mut.rates.nonTP53 = mut.rates.nonhyper[i,]
muts.nonTP53 = muts.nonhyper[,i]
muts.nonTP53.min10 = muts.nonTP53[rowSums(muts.nonTP53) > 10,]
res.gene.lr.nonTP53 = logistic.regression(muts.nonTP53.min10, ~ WGD + Log10_MutsPerMB + Type, covar = mut.rates.nonTP53, verbose = FALSE)
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
res.gene.lr.nonTP53.summary = data.frame(Gene = rownames(res.gene.lr.nonTP53$Estimate), WGD_Estimate=res.gene.lr.nonTP53$Estimate[,"WGDWGD"],
WGD_Zvalue = res.gene.lr.nonTP53$Zvalue[,"WGDWGD"],
WGD_Pvalue = res.gene.lr.nonTP53$Pvalue[,"WGDWGD"],
WGD_FDR = res.gene.lr.nonTP53$FDR[,"WGDWGD"])
res.gene.lr.nonTP53.summary = res.gene.lr.nonTP53.summary[order(res.gene.lr.nonTP53.summary$WGD_Pvalue),]
write.table(res.gene.lr.nonTP53.summary, "LR_WGD_Results_Nonhyper_nonTP53mutated.txt", sep="\t", quote=FALSE, row.names=FALSE)
label = rep("", nrow(res.gene.lr.nonTP53.summary))
top = head(order(res.gene.lr.summary$WGD_Pvalue), 10)
label[top] = rownames(res.gene.lr.nonTP53.summary)[top]
# Plot with all significant genes
g3 = ggplot(res.gene.lr.nonTP53.summary, aes(x=WGD_Estimate,
y=-log10(WGD_Pvalue),
color = WGD_FDR < 0.05)) +
geom_point() +
theme_bw() + xlab("Log odds ratio") + ylab("P-value (-log10)") +
theme(panel.grid = element_blank()) +
geom_vline(xintercept = 0, lty=2, color="grey") +
geom_text_repel(label = label) +
xlim(-3, 3) +
ggtitle("Including only non-TP53 mutated samples")
g3
## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_text_repel).
# Show table of results
res.gene.lr.nonTP53.summary.sig = subset(res.gene.lr.nonTP53.summary, WGD_Pvalue < 0.05)
kable(res.gene.lr.nonTP53.summary.sig, style = 'html', row.names = FALSE) %>%
kable_styling(bootstrap_options = "striped") %>%
scroll_box(width = "100%", height = "800px")
Gene | WGD_Estimate | WGD_Zvalue | WGD_Pvalue | WGD_FDR |
---|---|---|---|---|
CREBBP | -0.7573872 | -3.330352 | 0.0008674 | 0.2014334 |
RNF43 | -1.3991071 | -3.320763 | 0.0008977 | 0.2014334 |
SMAD4 | -1.0149265 | -3.233038 | 0.0012248 | 0.2014334 |
PTEN | -0.5280733 | -3.167605 | 0.0015370 | 0.2014334 |
PIK3CA | -0.3511188 | -3.150062 | 0.0016324 | 0.2014334 |
ADNP2 | -1.2013345 | -3.021193 | 0.0025178 | 0.2589149 |
HRAS | -1.0698854 | -2.837505 | 0.0045468 | 0.4007647 |
RHOA | -1.1729272 | -2.747925 | 0.0059974 | 0.4381836 |
PPP2R2D | 1.9900174 | 2.645439 | 0.0081585 | 0.4381836 |
DNAH7 | -0.4551239 | -2.615006 | 0.0089226 | 0.4381836 |
BRSK1 | -1.4725203 | -2.612695 | 0.0089831 | 0.4381836 |
FGFR3 | -0.7298253 | -2.561275 | 0.0104289 | 0.4381836 |
JAK1 | -0.8860764 | -2.518212 | 0.0117952 | 0.4381836 |
MTOR | -0.5911766 | -2.496233 | 0.0125520 | 0.4381836 |
MYO9A | -0.6088333 | -2.493545 | 0.0126475 | 0.4381836 |
BTBD11 | -0.7323884 | -2.449820 | 0.0142928 | 0.4381836 |
C1orf168 | -0.7711517 | -2.431809 | 0.0150236 | 0.4381836 |
EPAS1 | 0.8110039 | 2.389321 | 0.0168795 | 0.4381836 |
SMARCA4 | -0.5568790 | -2.385988 | 0.0170333 | 0.4381836 |
RBM10 | -0.7449828 | -2.363704 | 0.0180932 | 0.4381836 |
IGSF9 | 0.6854331 | 2.349618 | 0.0187927 | 0.4381836 |
PIK3R1 | -0.6434924 | -2.343274 | 0.0191154 | 0.4381836 |
BAP1 | 0.4920152 | 2.322928 | 0.0201830 | 0.4381836 |
GPS2 | -1.2829478 | -2.312851 | 0.0207308 | 0.4381836 |
CUBN | 0.3787204 | 2.305774 | 0.0211233 | 0.4381836 |
ZMYM2 | -0.8824764 | -2.293850 | 0.0217991 | 0.4381836 |
USP9X | -0.6262422 | -2.290533 | 0.0219904 | 0.4381836 |
KANSL1 | -0.6683929 | -2.283124 | 0.0224231 | 0.4381836 |
XYLT2 | -1.4678371 | -2.281967 | 0.0224913 | 0.4381836 |
ME3 | -1.0261674 | -2.281916 | 0.0224943 | 0.4381836 |
GTF2I | -0.9810036 | -2.280922 | 0.0225531 | 0.4381836 |
KMT2C | -0.3141781 | -2.276460 | 0.0228185 | 0.4381836 |
STK11 | 0.6451324 | 2.266250 | 0.0234361 | 0.4381836 |
KMT2B | -0.5109378 | -2.245148 | 0.0247586 | 0.4400067 |
HIST1H3B | -1.0435388 | -2.242023 | 0.0249599 | 0.4400067 |
PPP2R1A | 0.7656459 | 2.204708 | 0.0274746 | 0.4708838 |
RASA1 | -0.7878640 | -2.183765 | 0.0289795 | 0.4832525 |
EEF1A1 | -0.7900353 | -2.136364 | 0.0326497 | 0.5291934 |
CHD1 | -0.7010803 | -2.120317 | 0.0339793 | 0.5291934 |
COL7A1 | -0.4356894 | -2.116439 | 0.0343075 | 0.5291934 |
LMAN1 | -1.3288244 | -2.097471 | 0.0359519 | 0.5410323 |
CDH11 | 0.5438248 | 2.075624 | 0.0379288 | 0.5475859 |
ARID1A | -0.3035796 | -2.071136 | 0.0383461 | 0.5475859 |
MAX | -1.5897197 | -2.063661 | 0.0390499 | 0.5475859 |
ALYREF | 1.2996694 | 2.050780 | 0.0402883 | 0.5523979 |
CTCF | -0.6075972 | -2.039023 | 0.0414478 | 0.5559408 |
ELMSAN1 | -0.8550667 | -2.006401 | 0.0448135 | 0.5882962 |
HLA-A | -1.1269787 | -1.988120 | 0.0467985 | 0.5968917 |
RPL22 | -1.0262717 | -1.982681 | 0.0474031 | 0.5968917 |
tumor.tp53 = colnames(muts.nonhyper)[muts.nonhyper["TP53",] > 0]
i = intersect(intersect(rownames(mut.rates.nonhyper), colnames(muts.nonhyper)), tumor.tp53)
# Determine the number of tumor types that will have at least 5 tumors
# after subsetting to TP53 mutants
type.table = table(mut.rates.nonhyper[i,"Type"])
type.n = names(type.table)[type.table > 4]
tumor.type.n = rownames(subset(mut.rates.nonhyper, Type %in% type.n))
i2 = intersect(intersect(intersect(rownames(mut.rates.nonhyper), colnames(muts.nonhyper)), tumor.tp53), tumor.type.n)
mut.rates.TP53 = mut.rates.nonhyper[i2,]
mut.rates.TP53$Type = droplevels(mut.rates.TP53$Type)
muts.TP53 = muts.nonhyper[,i2]
muts.TP53.min10 = muts.TP53[rowSums(muts.TP53) > 10,]
res.gene.lr.TP53 = logistic.regression(muts.TP53.min10, ~ WGD + Log10_MutsPerMB + Type, covar = mut.rates.TP53, verbose = FALSE)
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
res.gene.lr.TP53.summary = data.frame(Gene = rownames(res.gene.lr.TP53$Estimate), WGD_Estimate=res.gene.lr.TP53$Estimate[,"WGDWGD"],
WGD_Zvalue = res.gene.lr.TP53$Zvalue[,"WGDWGD"],
WGD_Pvalue = res.gene.lr.TP53$Pvalue[,"WGDWGD"],
WGD_FDR = res.gene.lr.TP53$FDR[,"WGDWGD"])
res.gene.lr.TP53.summary = res.gene.lr.TP53.summary[order(res.gene.lr.TP53.summary$WGD_Pvalue),]
write.table(res.gene.lr.TP53.summary, "LR_WGD_Results_Nonhyper_TP53mutated.txt", sep="\t", quote=FALSE, row.names=FALSE)
label = rep("", nrow(res.gene.lr.TP53.summary))
top = head(order(res.gene.lr.summary$WGD_Pvalue), 10)
label[top] = rownames(res.gene.lr.TP53.summary)[top]
# Plot with all significant genes
g4 = ggplot(res.gene.lr.TP53.summary, aes(x=WGD_Estimate,
y=-log10(WGD_Pvalue),
color = WGD_FDR < 0.05)) +
geom_point() +
theme_bw() + xlab("Log odds ratio") + ylab("P-value (-log10)") +
theme(panel.grid = element_blank()) +
geom_vline(xintercept = 0, lty=2, color="grey") +
geom_text_repel(label = label) +
xlim(-5, 5) +
ggtitle("Including only TP53 mutated samples")
g4
# Show table of results
res.gene.lr.nonTP53.summary.sig = subset(res.gene.lr.nonTP53.summary, WGD_Pvalue < 0.05)
kable(res.gene.lr.nonTP53.summary.sig, style = 'html', row.names = FALSE) %>%
kable_styling(bootstrap_options = "striped") %>%
scroll_box(width = "100%", height = "800px")
Gene | WGD_Estimate | WGD_Zvalue | WGD_Pvalue | WGD_FDR |
---|---|---|---|---|
CREBBP | -0.7573872 | -3.330352 | 0.0008674 | 0.2014334 |
RNF43 | -1.3991071 | -3.320763 | 0.0008977 | 0.2014334 |
SMAD4 | -1.0149265 | -3.233038 | 0.0012248 | 0.2014334 |
PTEN | -0.5280733 | -3.167605 | 0.0015370 | 0.2014334 |
PIK3CA | -0.3511188 | -3.150062 | 0.0016324 | 0.2014334 |
ADNP2 | -1.2013345 | -3.021193 | 0.0025178 | 0.2589149 |
HRAS | -1.0698854 | -2.837505 | 0.0045468 | 0.4007647 |
RHOA | -1.1729272 | -2.747925 | 0.0059974 | 0.4381836 |
PPP2R2D | 1.9900174 | 2.645439 | 0.0081585 | 0.4381836 |
DNAH7 | -0.4551239 | -2.615006 | 0.0089226 | 0.4381836 |
BRSK1 | -1.4725203 | -2.612695 | 0.0089831 | 0.4381836 |
FGFR3 | -0.7298253 | -2.561275 | 0.0104289 | 0.4381836 |
JAK1 | -0.8860764 | -2.518212 | 0.0117952 | 0.4381836 |
MTOR | -0.5911766 | -2.496233 | 0.0125520 | 0.4381836 |
MYO9A | -0.6088333 | -2.493545 | 0.0126475 | 0.4381836 |
BTBD11 | -0.7323884 | -2.449820 | 0.0142928 | 0.4381836 |
C1orf168 | -0.7711517 | -2.431809 | 0.0150236 | 0.4381836 |
EPAS1 | 0.8110039 | 2.389321 | 0.0168795 | 0.4381836 |
SMARCA4 | -0.5568790 | -2.385988 | 0.0170333 | 0.4381836 |
RBM10 | -0.7449828 | -2.363704 | 0.0180932 | 0.4381836 |
IGSF9 | 0.6854331 | 2.349618 | 0.0187927 | 0.4381836 |
PIK3R1 | -0.6434924 | -2.343274 | 0.0191154 | 0.4381836 |
BAP1 | 0.4920152 | 2.322928 | 0.0201830 | 0.4381836 |
GPS2 | -1.2829478 | -2.312851 | 0.0207308 | 0.4381836 |
CUBN | 0.3787204 | 2.305774 | 0.0211233 | 0.4381836 |
ZMYM2 | -0.8824764 | -2.293850 | 0.0217991 | 0.4381836 |
USP9X | -0.6262422 | -2.290533 | 0.0219904 | 0.4381836 |
KANSL1 | -0.6683929 | -2.283124 | 0.0224231 | 0.4381836 |
XYLT2 | -1.4678371 | -2.281967 | 0.0224913 | 0.4381836 |
ME3 | -1.0261674 | -2.281916 | 0.0224943 | 0.4381836 |
GTF2I | -0.9810036 | -2.280922 | 0.0225531 | 0.4381836 |
KMT2C | -0.3141781 | -2.276460 | 0.0228185 | 0.4381836 |
STK11 | 0.6451324 | 2.266250 | 0.0234361 | 0.4381836 |
KMT2B | -0.5109378 | -2.245148 | 0.0247586 | 0.4400067 |
HIST1H3B | -1.0435388 | -2.242023 | 0.0249599 | 0.4400067 |
PPP2R1A | 0.7656459 | 2.204708 | 0.0274746 | 0.4708838 |
RASA1 | -0.7878640 | -2.183765 | 0.0289795 | 0.4832525 |
EEF1A1 | -0.7900353 | -2.136364 | 0.0326497 | 0.5291934 |
CHD1 | -0.7010803 | -2.120317 | 0.0339793 | 0.5291934 |
COL7A1 | -0.4356894 | -2.116439 | 0.0343075 | 0.5291934 |
LMAN1 | -1.3288244 | -2.097471 | 0.0359519 | 0.5410323 |
CDH11 | 0.5438248 | 2.075624 | 0.0379288 | 0.5475859 |
ARID1A | -0.3035796 | -2.071136 | 0.0383461 | 0.5475859 |
MAX | -1.5897197 | -2.063661 | 0.0390499 | 0.5475859 |
ALYREF | 1.2996694 | 2.050780 | 0.0402883 | 0.5523979 |
CTCF | -0.6075972 | -2.039023 | 0.0414478 | 0.5559408 |
ELMSAN1 | -0.8550667 | -2.006401 | 0.0448135 | 0.5882962 |
HLA-A | -1.1269787 | -1.988120 | 0.0467985 | 0.5968917 |
RPL22 | -1.0262717 | -1.982681 | 0.0474031 | 0.5968917 |
sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices datasets utils methods base
##
## other attached packages:
## [1] ggrepel_0.8.2 kableExtra_1.1.0 knitr_1.28 magrittr_1.5
## [5] cowplot_1.0.0 ggplot2_3.3.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.4.6 highr_0.8 pillar_1.4.4 compiler_3.6.1
## [5] tools_3.6.1 digest_0.6.25 viridisLite_0.3.0 evaluate_0.14
## [9] lifecycle_0.2.0 tibble_3.0.1 gtable_0.3.0 pkgconfig_2.0.3
## [13] rlang_0.4.6 rstudioapi_0.11 yaml_2.2.1 xfun_0.13
## [17] withr_2.2.0 stringr_1.4.0 httr_1.4.1 xml2_1.3.2
## [21] vctrs_0.2.4 hms_0.5.3 webshot_0.5.2 grid_3.6.1
## [25] glue_1.4.0 R6_2.4.1 rmarkdown_2.1 farver_2.0.3
## [29] readr_1.3.1 scales_1.1.0 ellipsis_0.3.0 htmltools_0.4.0
## [33] rvest_0.3.5 colorspace_1.4-1 renv_0.10.0 labeling_0.3
## [37] stringi_1.4.6 munsell_0.5.0 crayon_1.3.4