Uses the celda_G model to cluster features into modules for a range of possible L's. The module labels of the previous "L-1" model are used as the initial values in the current model with L modules. The best split of an existing module is found to create the L-th module. This procedure is much faster than randomly initializing each model with a different L.

recursiveSplitModule(
  x,
  useAssay = "counts",
  altExpName = "featureSubset",
  initialL = 10,
  maxL = 100,
  tempK = 100,
  zInit = NULL,
  sampleLabel = NULL,
  alpha = 1,
  beta = 1,
  delta = 1,
  gamma = 1,
  minFeature = 3,
  reorder = TRUE,
  seed = 12345,
  perplexity = TRUE,
  doResampling = FALSE,
  numResample = 5,
  verbose = TRUE,
  logfile = NULL
)

# S4 method for SingleCellExperiment
recursiveSplitModule(
  x,
  useAssay = "counts",
  altExpName = "featureSubset",
  initialL = 10,
  maxL = 100,
  tempK = 100,
  zInit = NULL,
  sampleLabel = NULL,
  alpha = 1,
  beta = 1,
  delta = 1,
  gamma = 1,
  minFeature = 3,
  reorder = TRUE,
  seed = 12345,
  perplexity = TRUE,
  doResampling = FALSE,
  numResample = 5,
  verbose = TRUE,
  logfile = NULL
)

# S4 method for matrix
recursiveSplitModule(
  x,
  useAssay = "counts",
  altExpName = "featureSubset",
  initialL = 10,
  maxL = 100,
  tempK = 100,
  zInit = NULL,
  sampleLabel = NULL,
  alpha = 1,
  beta = 1,
  delta = 1,
  gamma = 1,
  minFeature = 3,
  reorder = TRUE,
  seed = 12345,
  perplexity = TRUE,
  doResampling = FALSE,
  numResample = 5,
  verbose = TRUE,
  logfile = NULL
)

Arguments

x

A numeric matrix of counts or a SingleCellExperiment with the matrix located in the assay slot under useAssay. Rows represent features and columns represent cells.

useAssay

A string specifying which assay slot to use if x is a SingleCellExperiment object. Default "counts".

altExpName

The name for the altExp slot to use. Default "featureSubset".

initialL

Integer. Initial number of modules.

maxL

Integer. Maximum number of modules.

tempK

Integer. Number of temporary cell populations to identify and use in module splitting. Only used if zInit = NULL Collapsing cells to a relatively smaller number of cell popluations will increase the speed of module clustering and tend to produce better modules. This number should be larger than the number of true cell populations expected in the dataset. Default 100.

zInit

Integer vector. Collapse cells to cell populations based on labels in zInit and then perform module splitting. If NULL, no collapsing will be performed unless tempK is specified. Default NULL.

sampleLabel

Vector or factor. Denotes the sample label for each cell (column) in the count matrix. Default NULL.

alpha

Numeric. Concentration parameter for Theta. Adds a pseudocount to each cell population in each sample. Only used if zInit is set. Default 1.

beta

Numeric. Concentration parameter for Phi. Adds a pseudocount to each feature module in each cell. Default 1.

delta

Numeric. Concentration parameter for Psi. Adds a pseudocount to each feature in each module. Default 1.

gamma

Numeric. Concentration parameter for Eta. Adds a pseudocount to the number of features in each module. Default 1.

minFeature

Integer. Only attempt to split modules with at least this many features.

reorder

Logical. Whether to reorder modules using hierarchical clustering after each model has been created. If FALSE, module numbers will correspond to the split which created the module (i.e. 'L15' was created at split 15, 'L16' was created at split 16, etc.). Default TRUE.

seed

Integer. Passed to with_seed. For reproducibility, a default value of 12345 is used. If NULL, no calls to with_seed are made.

perplexity

Logical. Whether to calculate perplexity for each model. If FALSE, then perplexity can be calculated later with resamplePerplexity. Default TRUE.

doResampling

Boolean. If TRUE, then each cell in the counts matrix will be resampled according to a multinomial distribution to introduce noise before calculating perplexity. Default FALSE.

numResample

Integer. The number of times to resample the counts matrix for evaluating perplexity if doResampling is set to TRUE. Default 5.

verbose

Logical. Whether to print log messages. Default TRUE.

logfile

Character. Messages will be redirected to a file named "logfile". If NULL, messages will be printed to stdout. Default NULL.

Value

A SingleCellExperiment object. Function parameter settings and celda model results are stored in the metadata "celda_grid_search" slot. The models in the list will be of class celda_G if zInit = NULL or celda_CG if zInit is set.

See also

recursiveSplitCell for recursive splitting of cell populations.

Examples

data(sceCeldaCG) ## Create models that range from L=3 to L=20 by recursively splitting modules ## into two moduleSplit <- recursiveSplitModule(sceCeldaCG, initialL = 3, maxL = 20)
#> ==================================================
#> Starting recursive module splitting.
#> ==================================================
#> Sat Apr 30 15:15:22 2022 .. Collapsing to 100 temporary cell populations
#> Sat Apr 30 15:15:23 2022 .. Initializing with 3 modules
#> Sat Apr 30 15:15:23 2022 .. Created module 4 | logLik: -1241379.90928455
#> Sat Apr 30 15:15:23 2022 .. Created module 5 | logLik: -1235212.7977535
#> Sat Apr 30 15:15:23 2022 .. Created module 6 | logLik: -1232789.9817561
#> Sat Apr 30 15:15:24 2022 .. Created module 7 | logLik: -1227246.66090571
#> Sat Apr 30 15:15:24 2022 .. Created module 8 | logLik: -1223898.757694
#> Sat Apr 30 15:15:24 2022 .. Created module 9 | logLik: -1221848.26936098
#> Sat Apr 30 15:15:24 2022 .. Created module 10 | logLik: -1220147.96681948
#> Sat Apr 30 15:15:24 2022 .. Created module 11 | logLik: -1220818.37022325
#> Sat Apr 30 15:15:24 2022 .. Created module 12 | logLik: -1221489.07685946
#> Sat Apr 30 15:15:24 2022 .. Created module 13 | logLik: -1222032.53497571
#> Sat Apr 30 15:15:24 2022 .. Created module 14 | logLik: -1222712.17543857
#> Sat Apr 30 15:15:24 2022 .. Created module 15 | logLik: -1223268.97596756
#> Sat Apr 30 15:15:24 2022 .. Created module 16 | logLik: -1223841.4834406
#> Sat Apr 30 15:15:24 2022 .. Created module 17 | logLik: -1224394.02513994
#> Sat Apr 30 15:15:25 2022 .. Created module 18 | logLik: -1224863.41435811
#> Sat Apr 30 15:15:25 2022 .. Created module 19 | logLik: -1225480.30453125
#> Sat Apr 30 15:15:25 2022 .. Created module 20 | logLik: -1226156.47078695
#> Sat Apr 30 15:15:25 2022 .. Calculating perplexity
#> ==================================================
#> Completed recursive module splitting. Total time: 2.877337 secs
#> ==================================================
## Example results with perplexity plotGridSearchPerplexity(moduleSplit)
## Select model for downstream analysis celdaMod <- subsetCeldaList(moduleSplit, list(L = 10)) data(celdaCGSim) ## Create models that range from L=3 to L=20 by recursively splitting modules ## into two moduleSplit <- recursiveSplitModule(celdaCGSim$counts, initialL = 3, maxL = 20)
#> ==================================================
#> Starting recursive module splitting.
#> ==================================================
#> Sat Apr 30 15:15:25 2022 .. Collapsing to 100 temporary cell populations
#> Sat Apr 30 15:15:26 2022 .. Initializing with 3 modules
#> Sat Apr 30 15:15:26 2022 .. Created module 4 | logLik: -1243396.62348886
#> Sat Apr 30 15:15:26 2022 .. Created module 5 | logLik: -1237610.11790137
#> Sat Apr 30 15:15:26 2022 .. Created module 6 | logLik: -1232128.87013396
#> Sat Apr 30 15:15:26 2022 .. Created module 7 | logLik: -1227611.8250329
#> Sat Apr 30 15:15:26 2022 .. Created module 8 | logLik: -1225618.06184004
#> Sat Apr 30 15:15:26 2022 .. Created module 9 | logLik: -1223967.77531912
#> Sat Apr 30 15:15:26 2022 .. Created module 10 | logLik: -1222801.11395987
#> Sat Apr 30 15:15:27 2022 .. Created module 11 | logLik: -1223402.66903597
#> Sat Apr 30 15:15:27 2022 .. Created module 12 | logLik: -1224026.19892208
#> Sat Apr 30 15:15:27 2022 .. Created module 13 | logLik: -1224675.63005464
#> Sat Apr 30 15:15:27 2022 .. Created module 14 | logLik: -1225317.91966369
#> Sat Apr 30 15:15:27 2022 .. Created module 15 | logLik: -1225971.50555157
#> Sat Apr 30 15:15:27 2022 .. Created module 16 | logLik: -1226557.7881506
#> Sat Apr 30 15:15:27 2022 .. Created module 17 | logLik: -1227080.13473523
#> Sat Apr 30 15:15:27 2022 .. Created module 18 | logLik: -1227603.99622355
#> Sat Apr 30 15:15:27 2022 .. Created module 19 | logLik: -1228247.84169741
#> Sat Apr 30 15:15:27 2022 .. Created module 20 | logLik: -1228828.70617002
#> Sat Apr 30 15:15:27 2022 .. Calculating perplexity
#> ==================================================
#> Completed recursive module splitting. Total time: 2.221109 secs
#> ==================================================
## Example results with perplexity plotGridSearchPerplexity(moduleSplit)
## Select model for downstream analysis celdaMod <- subsetCeldaList(moduleSplit, list(L = 10))