A wrapper function for addPerCellQC. Calculate general quality control metrics for each cell in the count matrix.

runPerCellQC(
  inSCE,
  useAssay = "counts",
  collectionName = NULL,
  geneSetList = NULL,
  geneSetListLocation = "rownames",
  geneSetCollection = NULL,
  mitoRef = NULL,
  mitoIDType = NULL,
  mitoPrefix = NULL,
  mitoID = NULL,
  mitoGeneLocation = NULL,
  percent_top = c(50, 100, 200, 500),
  use_altexps = FALSE,
  flatten = TRUE,
  detectionLimit = 0,
  BPPARAM = BiocParallel::SerialParam()
)

Arguments

inSCE

Input SingleCellExperiment object.

useAssay

A string specifying which assay in the SCE to use. Default "counts".

collectionName

Character. Name of a GeneSetCollection obtained by using one of the importGeneSet* functions. Default NULL.

geneSetList

List of gene sets to be quantified. The genes in the assays will be matched to the genes in the list based on geneSetListLocation. Default NULL.

geneSetListLocation

Character or numeric vector. If set to 'rownames', then the genes in 'geneSetList' will be looked up in rownames(inSCE). If another character is supplied, then genes will be looked up in the column names of rowData(inSCE). A character vector with the same length as geneSetList can be supplied if the IDs for different gene sets are found in different places, including a mixture of 'rownames' and rowData(inSCE). An integer or integer vector can be supplied to denote the column index in rowData(inSCE). Default 'rownames'.

geneSetCollection

Class of GeneSetCollection from package GSEAbase. The location of the gene IDs in inSCE should be in the description slot of each gene set and should follow the same notation as geneSetListLocation. The function getGmt can be used to read in gene sets from a GMT file. If reading a GMT file, the second column for each gene set should be the description denoting the location of the gene IDs in inSCE. These gene sets will be included with those from geneSetList if both parameters are provided.

mitoRef

Character. The species used to extract mitochondrial genes ID from build-in mitochondrial geneset in SCTK. Available species options are "human" and "mouse". Default is NULL.

mitoIDType

Character. Types of mitochondrial gene id. Now it supports "symbol", "entrez", "ensembl" and "ensemblTranscriptID". It is used with mitoRef to extract mitochondrial genes from build-in mitochondrial geneset in SCTK. Default NULL.

mitoPrefix

Character. The prefix used to get mitochondrial gene from either rownames(inSCE) or columns of rowData(inSCE) specified by mitoGeneLocation. This parameter is usually used to extract mito genes from gene symbol. For example, mitoPrefix = "^MT-" can be used to detect mito gene symbols like "MT-ND4".

mitoID

Character. A vector of mitochondrial genes to be quantified.

mitoGeneLocation

Character. Describes the location within inSCE where the gene identifiers in the mitochondrial gene sets should be mapped. If set to "rownames" then the features will be searched for among rownames(inSCE). This can also be set to one of the column names of rowData(inSCE) in which case the gene identifies will be mapped to that column in the rowData of inSCE. See featureIndex for more information. Default NULL.

percent_top

An integer vector. Each element is treated as a number of top genes to compute the percentage of library size occupied by the most highly expressed genes in each cell.

use_altexps

Logical scalar indicating whether QC statistics should be computed for alternative Experiments in x. If TRUE, statistics are computed for all alternative experiments. Alternatively, an integer or character vector specifying the alternative Experiments to use to compute QC statistics. Alternatively NULL, in which case alternative experiments are not used.

flatten

Logical scalar indicating whether the nested DataFrame-class in the output should be flattened.

detectionLimit

A numeric scalar specifying the lower detection limit for expression.

BPPARAM

A BiocParallelParam object specifying whether the QC calculations should be parallelized.

Value

A SingleCellExperiment object with cell QC metrics added to the colData slot. If geneSetList or geneSetCollection are provided, then the rownames for each gene set will be saved in metadata(inSCE)$scater$addPerCellQC$geneSets.

Details

This function allows multiple ways to import mitochondrial genes and quantify their expression.

  • Using mitoRef, mitoIDType and mitoGeneLocation parameters will load the build-in mitochondrial geneset in SCTK package.

  • Using mitoPrefix and mitoGeneLocation parameters will extract mitochondrial genes from either rownames(inSCE) or columns of rowData(inSCE) specified ny parameter mitoGeneLocation

  • Using mitoID and mitoGeneLocation parameters will quantify the expression of mitochondrial genes stored in mitoID.

mitoGeneLocation is required if you use any methods mentioned above to quantify mitochondrial gene expression. Please make sure mitoGeneLocation is pointing to the location within inSCE object that stores the correct mitochondrial genes ID.

Examples

data(scExample, package = "singleCellTK")
mito.ix = grep("^MT-", rowData(sce)$feature_name)
geneSet <- list("Mito"=rownames(sce)[mito.ix])
sce <- runPerCellQC(sce, geneSetList = geneSet)
#> Thu Apr 28 11:28:37 2022 ... Running 'perCellQCMetrics'