Feature clustering with Celda

Clusters the rows of a count matrix containing single-cell data into L modules. The useAssay assay slot in altExpName altExp slot will be used if it exists. Otherwise, the useAssay assay slot in x will be used if x is a SingleCellExperiment object.

celda_G(
  x,
  useAssay = "counts",
  altExpName = "featureSubset",
  L,
  beta = 1,
  delta = 1,
  gamma = 1,
  stopIter = 10,
  maxIter = 200,
  splitOnIter = 10,
  splitOnLast = TRUE,
  seed = 12345,
  nchains = 3,
  yInitialize = c("split", "random", "predefined"),
  countChecksum = NULL,
  yInit = NULL,
  logfile = NULL,
  verbose = TRUE
)

# S4 method for SingleCellExperiment
celda_G(
  x,
  useAssay = "counts",
  altExpName = "featureSubset",
  L,
  beta = 1,
  delta = 1,
  gamma = 1,
  stopIter = 10,
  maxIter = 200,
  splitOnIter = 10,
  splitOnLast = TRUE,
  seed = 12345,
  nchains = 3,
  yInitialize = c("split", "random", "predefined"),
  countChecksum = NULL,
  yInit = NULL,
  logfile = NULL,
  verbose = TRUE
)

# S4 method for ANY
celda_G(
  x,
  useAssay = "counts",
  altExpName = "featureSubset",
  L,
  beta = 1,
  delta = 1,
  gamma = 1,
  stopIter = 10,
  maxIter = 200,
  splitOnIter = 10,
  splitOnLast = TRUE,
  seed = 12345,
  nchains = 3,
  yInitialize = c("split", "random", "predefined"),
  countChecksum = NULL,
  yInit = NULL,
  logfile = NULL,
  verbose = TRUE
)

Arguments

x	A SingleCellExperiment with the matrix located in the assay slot under `useAssay`. Rows represent features and columns represent cells. Alternatively, any matrix-like object that can be coerced to a sparse matrix of class "dgCMatrix" can be directly used as input. The matrix will automatically be converted to a SingleCellExperiment object.
useAssay	A string specifying the name of the assay slot to use. Default "counts".
altExpName	The name for the altExp slot to use. Default "featureSubset".
L	Integer. Number of feature modules.
beta	Numeric. Concentration parameter for Phi. Adds a pseudocount to each feature module in each cell. Default 1.
delta	Numeric. Concentration parameter for Psi. Adds a pseudocount to each feature in each module. Default 1.
gamma	Numeric. Concentration parameter for Eta. Adds a pseudocount to the number of features in each module. Default 1.
stopIter	Integer. Number of iterations without improvement in the log likelihood to stop inference. Default 10.
maxIter	Integer. Maximum number of iterations of Gibbs sampling to perform. Default 200.
splitOnIter	Integer. On every `splitOnIter` iteration, a heuristic will be applied to determine if a feature module should be reassigned and another feature module should be split into two clusters. To disable splitting, set to -1. Default 10.
splitOnLast	Integer. After `stopIter` iterations have been performed without improvement, a heuristic will be applied to determine if a cell population should be reassigned and another cell population should be split into two clusters. If a split occurs, then `stopIter` will be reset. Default TRUE.
seed	Integer. Passed to with_seed. For reproducibility, a default value of 12345 is used. If NULL, no calls to with_seed are made.
nchains	Integer. Number of random cluster initializations. Default 3.
yInitialize	Chararacter. One of 'random', 'split', or 'predefined'. With 'random', features are randomly assigned to a modules. With 'split', features will be split into sqrt(L) modules and then each module will be subsequently split into another sqrt(L) modules. With 'predefined', values in `yInit` will be used to initialize `y`. Default 'split'.
countChecksum	Character. An MD5 checksum for the `counts` matrix. Default NULL.
yInit	Integer vector. Sets initial starting values of y. `yInit` can only be used when `yInitialize = 'predefined'`. Default NULL.
logfile	Character. Messages will be redirected to a file named `logfile`. If NULL, messages will be printed to stdout. Default NULL.
verbose	Logical. Whether to print log messages. Default TRUE.

Value

A SingleCellExperiment object. Function parameter settings are stored in the metadata "celda_parameters" slot. Column celda_feature_module in rowData contains feature modules.

Examples

data(celdaGSim)
sce <- celda_G(celdaGSim$counts, L = celdaGSim$L, nchains = 1)
#> --------------------------------------------------
#> Starting Celda_G: Clustering genes.
#> --------------------------------------------------
#> Sat Apr 30 15:14:09 2022 .. Initializing 'y' in chain 1 with 'split' 
#> Sat Apr 30 15:14:11 2022 .... Completed iteration: 1 | logLik: -290669.046132139
#> Sat Apr 30 15:14:11 2022 .... Completed iteration: 2 | logLik: -290669.046132139
#> Sat Apr 30 15:14:11 2022 .... Completed iteration: 3 | logLik: -290669.046132139
#> Sat Apr 30 15:14:11 2022 .... Completed iteration: 4 | logLik: -290669.046132139
#> Sat Apr 30 15:14:11 2022 .... Completed iteration: 5 | logLik: -290669.046132139
#> Sat Apr 30 15:14:11 2022 .... Completed iteration: 6 | logLik: -290669.046132139
#> Sat Apr 30 15:14:11 2022 .... Completed iteration: 7 | logLik: -290669.046132139
#> Sat Apr 30 15:14:11 2022 .... Completed iteration: 8 | logLik: -290669.046132139
#> Sat Apr 30 15:14:11 2022 .... Completed iteration: 9 | logLik: -290669.046132139
#> Sat Apr 30 15:14:11 2022 .... Determining if any gene clusters should be split.
#> Sat Apr 30 15:14:11 2022 .... No additional splitting was performed.
#> Sat Apr 30 15:14:11 2022 .... Completed iteration: 10 | logLik: -290669.046132139
#> Sat Apr 30 15:14:11 2022 .... Completed iteration: 11 | logLik: -290669.046132139
#> Sat Apr 30 15:14:11 2022 .. Finished chain 1
#> --------------------------------------------------
#> Completed Celda_G. Total time: 2.014838 secs
#> --------------------------------------------------

Arguments

Value

See also

Examples