Mutational signatures and exposures will be discovered using methods such as Latent Dirichlet Allocation (lda) or Non-Negative Matrix Factorization (nmf). These algorithms will deconvolute a matrix of counts for mutation types in each sample to two matrices: 1) a "signature" matrix containing the probability of each mutation type in each sample and 2) an "exposure" matrix containing the estimated counts for each signature in each sample. Before mutational discovery can be performed, variants from samples first need to be stored in a musica object using the create_musica function and mutation count tables need to be created using functions such as build_standard_table.

discover_signatures(
  musica,
  table_name,
  num_signatures,
  algorithm = "lda",
  seed = 1,
  nstart = 10,
  par_cores = 1
)

Arguments

musica

A musica object.

table_name

Name of the table to use for signature discovery. Needs to be the same name supplied to the table building functions such as build_standard_table.

num_signatures

Number of signatures to discover.

algorithm

Method to use for mutational signature discovery. One of "lda" or "nmf". Default "lda".

seed

Seed to be used for the random number generators in the signature discovery algorithms. Default 1.

nstart

Number of independent random starts used in the mutational signature algorithms. Default 10.

par_cores

Number of parallel cores to use. Only used if method = "nmf". Default 1.

Value

Returns a A musica_result object containing signatures and exposures.

Examples

data(musica)
g <- select_genome("19")
build_standard_table(musica, g, "SBS96", overwrite = TRUE)
#> Building count table from SBS with SBS96 schema
#> Warning: Overwriting counts table: SBS96
discover_signatures(musica = musica, table_name = "SBS96",
num_signatures = 3, algorithm = "lda", seed = 12345, nstart = 1)
#> An object of class "musica_result"
#> Slot "signatures":
#>           Signature1   Signature2   Signature3
#> C>A_ACA 3.214412e-22 4.257622e-02 4.320778e-02
#> C>A_ACC 7.068686e-72 3.378749e-02 7.273860e-57
#> C>A_ACG 1.696120e-76 5.631249e-03 3.801190e-79
#> C>A_ACT 1.439528e-70 1.689375e-02 3.139490e-54
#> C>A_CCA 2.443956e-69 2.815624e-02 1.256298e-02
#> C>A_CCC 1.554471e-69 2.815624e-02 2.512596e-02
#> C>A_CCG 6.265378e-03 1.622837e-02 5.081727e-05
#> C>A_CCT 9.750270e-19 3.378747e-02 3.962780e-08
#> C>A_GCA 6.265689e-03 1.623538e-02 3.449016e-05
#> C>A_GCC 5.661380e-72 1.126250e-02 1.256298e-02
#> C>A_GCG 1.441197e-67 1.126250e-02 4.896510e-48
#> C>A_GCT 4.251600e-71 3.941874e-02 8.310749e-55
#> C>A_TCA 1.079587e-75 2.815624e-02 2.030294e-77
#> C>A_TCC 6.227899e-72 1.689375e-02 2.512596e-02
#> C>A_TCG 2.917076e-72 2.252499e-02 2.383283e-56
#> C>A_TCT 1.040964e-63 3.378749e-02 1.256298e-02
#> C>G_ACA 8.982685e-72 1.689375e-02 4.340776e-55
#> C>G_ACC 3.332204e-14 1.023482e-02 2.292690e-03
#> C>G_ACG 3.720076e-44 3.720076e-44 3.720076e-44
#> C>G_ACT 4.790181e-16 1.602187e-02 2.707106e-02
#> C>G_CCA 5.966718e-72 5.631249e-03 1.256298e-02
#> C>G_CCC 9.067051e-17 2.252476e-02 5.134283e-07
#> C>G_CCG 5.623647e-03 5.631249e-03 2.985272e-45
#> C>G_CCT 2.341534e-13 2.052206e-02 4.468438e-03
#> C>G_GCA 3.673864e-72 2.815624e-02 4.208045e-57
#> C>G_GCC 5.477960e-80 1.236855e-78 1.256298e-02
#> C>G_GCG 3.720076e-44 3.720076e-44 3.720076e-44
#> C>G_GCT 1.298768e-79 3.551297e-78 5.025193e-02
#> C>G_TCA 3.448376e-76 1.126250e-02 8.959530e-78
#> C>G_TCC 8.532596e-21 1.315044e-02 3.347705e-02
#> C>G_TCG 3.720076e-44 3.720076e-44 3.720076e-44
#> C>G_TCT 1.559492e-17 7.339826e-03 2.131423e-02
#> C>T_ACA 9.283284e-02 1.860131e-02 1.493970e-02
#> C>T_ACC 1.687094e-02 4.660678e-76 1.256298e-02
#> C>T_ACG 1.687094e-02 5.605620e-76 2.512596e-02
#> C>T_ACT 6.951015e-02 1.850334e-02 2.957119e-02
#> C>T_CCA 7.873106e-02 1.689375e-02 6.554712e-69
#> C>T_CCC 6.318646e-02 3.604767e-02 4.224660e-02
#> C>T_CCG 1.124729e-02 5.631814e-84 3.067592e-77
#> C>T_CCT 1.007431e-01 2.657566e-02 1.716718e-02
#> C>T_GCA 6.186012e-02 2.252499e-02 2.512596e-02
#> C>T_GCC 5.789264e-02 1.231471e-02 4.420470e-02
#> C>T_GCG 5.623647e-03 2.469049e-75 1.256298e-02
#> C>T_GCT 4.210611e-02 1.396361e-02 3.810356e-02
#> C>T_TCA 4.634952e-02 3.704481e-02 1.482014e-02
#> C>T_TCC 1.008866e-01 3.337747e-02 5.192405e-02
#> C>T_TCG 1.124729e-02 5.346482e-78 4.813211e-68
#> C>T_TCT 6.186012e-02 1.689375e-02 1.256298e-02
#> T>A_ATA 4.222375e-70 1.126250e-02 1.127285e-52
#> T>A_ATC 3.720076e-44 3.720076e-44 3.720076e-44
#> T>A_ATG 1.244484e-69 1.126250e-02 1.799990e-51
#> T>A_ATT 5.623647e-03 5.631249e-03 1.256298e-02
#> T>A_CTA 4.540720e-68 1.126250e-02 1.559780e-48
#> T>A_CTC 2.329443e-68 1.126250e-02 1.256298e-02
#> T>A_CTG 1.583880e-69 1.126250e-02 2.512596e-02
#> T>A_CTT 5.623647e-03 5.631249e-03 1.256298e-02
#> T>A_GTA 6.742502e-71 2.212938e-67 2.512596e-02
#> T>A_GTC 3.720076e-44 3.720076e-44 3.720076e-44
#> T>A_GTG 5.623647e-03 5.631249e-03 8.389862e-71
#> T>A_GTT 3.720076e-44 3.720076e-44 3.720076e-44
#> T>A_TTA 3.534605e-76 1.689375e-02 1.112954e-76
#> T>A_TTC 5.623647e-03 5.806495e-76 1.256298e-02
#> T>A_TTG 5.623647e-03 1.485272e-75 1.256298e-02
#> T>A_TTT 1.687094e-02 1.689375e-02 2.512596e-02
#> T>C_ATA 1.100756e-13 6.228549e-08 1.256284e-02
#> T>C_ATC 2.279139e-68 1.126250e-02 1.256298e-02
#> T>C_ATG 1.296000e-02 9.303956e-03 5.432740e-04
#> T>C_ATT 5.837458e-03 2.529910e-02 3.102243e-02
#> T>C_CTA 4.866067e-76 5.631249e-03 6.309342e-77
#> T>C_CTC 1.124729e-02 5.631249e-03 1.439249e-71
#> T>C_CTG 6.659905e-66 5.631249e-03 2.512596e-02
#> T>C_CTT 1.230529e-02 2.999234e-12 2.276246e-02
#> T>C_GTA 3.619894e-71 2.772732e-66 2.512596e-02
#> T>C_GTC 3.720076e-44 3.720076e-44 3.720076e-44
#> T>C_GTG 3.720076e-44 3.720076e-44 3.720076e-44
#> T>C_GTT 5.623647e-03 7.333095e-76 1.256298e-02
#> T>C_TTA 1.687094e-02 5.631249e-03 4.987590e-72
#> T>C_TTC 6.074431e-03 7.222243e-03 2.056952e-02
#> T>C_TTG 1.979686e-68 5.631249e-03 6.494058e-49
#> T>C_TTT 5.623647e-03 8.877562e-76 1.256298e-02
#> T>G_ATA 3.720076e-44 3.720076e-44 3.720076e-44
#> T>G_ATC 5.623647e-03 3.239622e-73 1.256298e-02
#> T>G_ATG 3.720076e-44 3.720076e-44 3.720076e-44
#> T>G_ATT 3.720076e-44 3.720076e-44 3.720076e-44
#> T>G_CTA 2.440947e-78 1.427290e-77 1.256298e-02
#> T>G_CTC 3.720076e-44 3.720076e-44 3.720076e-44
#> T>G_CTG 5.623647e-03 1.120463e-83 2.578236e-76
#> T>G_CTT 9.044721e-72 5.631249e-03 2.512596e-02
#> T>G_GTA 8.002118e-77 5.631249e-03 7.430059e-78
#> T>G_GTC 3.720076e-44 3.720076e-44 3.720076e-44
#> T>G_GTG 3.720076e-44 3.720076e-44 3.720076e-44
#> T>G_GTT 7.320200e-69 5.631249e-03 1.047583e-49
#> T>G_TTA 1.124729e-02 4.725050e-80 1.226595e-70
#> T>G_TTC 3.720076e-44 3.720076e-44 3.720076e-44
#> T>G_TTG 3.720076e-44 3.720076e-44 3.720076e-44
#> T>G_TTT 3.720076e-44 3.720076e-44 3.720076e-44
#> 
#> Slot "exposures":
#>            TCGA-56-7582-01A-11D-2042-08 TCGA-77-7335-01A-11D-2042-08
#> Signature1                   0.05077393                   0.05080516
#> Signature2                  46.89845213                   0.05080516
#> Signature3                   0.05077393                  57.89838968
#>            TCGA-94-7557-01A-11D-2122-08 TCGA-97-7938-01A-11D-2167-08
#> Signature1                     5.061058                   0.05087258
#> Signature2                    14.057990                 116.89825485
#> Signature3                     9.880952                   0.05087258
#>            TCGA-EE-A3J5-06A-11D-A20D-08 TCGA-ER-A197-06A-32D-A197-08
#> Signature1                 121.89824941                   0.05024105
#> Signature2                   0.05087529                   0.05024105
#> Signature3                   0.05087529                  10.89951790
#>            TCGA-ER-A19O-06A-11D-A197-08
#> Signature1                  50.89842631
#> Signature2                   0.05078684
#> Signature3                   0.05078684
#> 
#> Slot "table_name":
#> [1] "SBS96"
#> 
#> Slot "algorithm":
#> [1] "LDA"
#> 
#> Slot "musica":
#> An object of class "musica"
#> Slot "variants":
#>        chr     start       end ref alt                       sample
#>   1:  chr1  11020563  11020563   C   A TCGA-94-7557-01A-11D-2122-08
#>   2:  chr1  43430030  43430030   G   T TCGA-94-7557-01A-11D-2122-08
#>   3:  chr1  58682403  58682403   A   G TCGA-94-7557-01A-11D-2122-08
#>   4:  chr1 109508295 109508295   C   G TCGA-94-7557-01A-11D-2122-08
#>   5:  chr1 156384826 156384826   A   C TCGA-94-7557-01A-11D-2122-08
#>  ---                                                               
#> 907: chr19  54885292  54885292   G   A TCGA-ER-A19O-06A-11D-A197-08
#> 908: chr20  49374328  49374328   A   T TCGA-ER-A19O-06A-11D-A197-08
#> 909:  chrX  73213768  73213768   T   G TCGA-ER-A19O-06A-11D-A197-08
#> 910:  chrX 101292834 101292834   G   A TCGA-ER-A19O-06A-11D-A197-08
#> 911:  chrX 107526690 107526690   C   T TCGA-ER-A19O-06A-11D-A197-08
#>      Variant_Type
#>   1:          SBS
#>   2:          SBS
#>   3:          SBS
#>   4:          SBS
#>   5:          SBS
#>  ---             
#> 907:          SBS
#> 908:          SBS
#> 909:          SBS
#> 910:          SBS
#> 911:          SBS
#> 
#> Slot "count_tables":
#> $SBS96
#> Count_Table:  SBS96 
#> Motifs: 96 
#> Samples: 7 
#>  
#> **Annotations: 
#>            motif mutation context
#> C>A_ACA C>A_ACA      C>A     ACA
#> C>A_ACC C>A_ACC      C>A     ACC
#> C>A_ACG C>A_ACG      C>A     ACG
#> C>A_ACT C>A_ACT      C>A     ACT
#> C>A_CCA C>A_CCA      C>A     CCA
#> C>A_CCC C>A_CCC      C>A     CCC
#> 7           ...      ...     ... 
#> 
#> **Features: 
#>    mutation
#> 1  C>A_CCG
#> 2  C>A_GCA
#> 3  T>C_ATG
#> 4  C>G_TGC
#> 5  T>G_AGC
#> 6  C>G_CCT
#> 7      ... 
#> 
#> **Types: 
#>  SBS
#>  
#> **Color Variable: 
#>  mutation
#>  
#> **Color Mapping: 
#>  #5ABCEBFF
#>  #050708FF
#>  #D33C32FF
#>  #CBCACBFF
#>  #ABCD72FF
#>  #E7C9C6FF
#>  
#> **Descriptions: 
#>  Single Base Substitution table with one base upstream and downstream
#> 
#> 
#> Slot "sample_annotations":
#>                         Samples Tumor_Subtypes
#> 1: TCGA-94-7557-01A-11D-2122-08           Lung
#> 2: TCGA-56-7582-01A-11D-2042-08           Lung
#> 3: TCGA-77-7335-01A-11D-2042-08           Lung
#> 4: TCGA-97-7938-01A-11D-2167-08           Lung
#> 5: TCGA-EE-A3J5-06A-11D-A20D-08         Breast
#> 6: TCGA-ER-A197-06A-32D-A197-08         Breast
#> 7: TCGA-ER-A19O-06A-11D-A197-08         Breast
#> 
#> 
#> Slot "umap":
#> <0 x 0 matrix>
#>