Plots for helping decide number of clusters

To help decide the number of cluster, three different methods are provided: total within cluster sum of squares, average silhouette coefficient, and gap statistics.

Usage

k_select(
  musica,
  model_name,
  modality = "SBS96",
  result_name = "result",
  method = "wss",
  clust.method = "kmeans",
  n = 10,
  proportional = TRUE
)

Arguments

musica: A musica object containing a mutational discovery or prediction. A two-dimensional UMAP has to be stored in this object.
model_name: The name of the desired model.
modality: The modality of the model. Must be "SBS96", "DBS78", or "IND83". Default "SBS96".
result_name: Name of the result list entry containing desired model. Default "result".
method: A single character string indicating which statistic to use for plot. Options are "wss" (total within cluster sum of squares), "silhouette" (average silhouette coefficient), and "gap_stat" (gap statistic). Default is "wss".
clust.method: A character string indicating clustering method. Options are "kmeans" (default), "hclust" (hierarchical clustering), "hkmeans", "pam", and "clara".
n: An integer indicating maximum number of clusters to test. Default is 10.
proportional: Logical, indicating if proportional exposure (default) will be used for clustering.

Value

A ggplot object.

Examples

data(res_annot)
set.seed(123)
# Make an elbow plot
k_select(res_annot, model_name = "res_annot", method = "wss", n = 6)

# Plot average silhouette coefficient against number of clusters
k_select(res_annot, model_name = "res_annot", method = "silhouette", n = 6)

# Plot gap statistics against number of clusters
k_select(res_annot, model_name = "res_annot", method = "gap_stat", n = 6)

Plots for helping decide number of clusters

Usage

Arguments

Value

See also

Examples