vignettes/articles/ui_seurat_curated_workflow.Rmd
ui_seurat_curated_workflow.Rmd
Seurat is an R package (Butler et al., Nature Biotechnology 2018 & Stuart, Butler, et al., Cell 2019) that offers various functions to perform analysis of scRNA-Seq data on the R console. In the singleCellTK, we implement all the common steps of the proposed workflow in an interactive and easy to use graphical interface including interactive visualizations. The purpose of this curated workflow is to allow the users to follow a standardized step-by-step workflow for an effortless analysis of their data.
A general workflow for the Seurat tab is summarized in the figure below:
In this tutorial example, we illustrate all the steps of the curated workflow and focus on the options available to manipulate and customize the steps of the workflow as per user requirements. To initiate the Seurat
workflow, click on the ‘Curated Workflows’ from the top menu and select Seurat
.
NOTE: This tutorial assumes that the data has already been uploaded via the upload tab of the toolkit and filtered before using the workflow.
Assuming that the data has been uploaded via the Upload tab of the toolkit, the first step for the analysis of the data is the Normalization of data. For this purpose, any assay available in the uploaded data can be used against one of the three methods of normalization available through Seurat
i.e. LogNormalize
, CLR
(Centered Log Ratio) or RC
(Relative Counts).
assay
to normalize from the dropdown menu.LogNormalize
, CLR
or RC
.10000
.Once normalization is complete, data needs to be scaled and centered accordingly. Seurat
uses linear
(linear model), poisson
(generalized linear model) or negbinom
(generalized linear model) as a regression model.
linear
, poisson
or negbinom
.Identification of the highly variable genes is core to the Seurat
workflow and these highly variable genes are used throughout the remaining workflow. Seurat
provides three methods for variable genes identification i.e. vst
(uses local polynomial regression to fit a relationship between log of variance and log of mean), mean.var.plot
(uses mean and dispersion to divide features into bins) and dispersion
(uses highest dispersion values only).
vst
, mean.var.plot
and dispersion
.2000
.Seurat
workflow offers PCA
or ICA
for dimensionality reduction and the components from these methods can be used in the downstream analysis. Moreover, several plots are available for the user to inspect the output of the dimensionality reduction such as the standard ‘PCA Plot’, ‘Elbow Plot’, ‘Jackstraw Plot’ and ‘Heatmap Plot’.
50
.TRUE
.‘tSNE’ and ‘UMAP’ can be computed and plotted once components are available from ‘Dimensionality Reduction’ tab.
Cluster labels can be generated for all cells/samples using one of the computed reduction method. Plots are automatically re-computed with cluster labels. The available algorithms for clustering as provided by Seurat
include original Louvain algorithm
, Louvain algorithm with multilevel refinement
and SLM algorithm
.
original Louvain algorithm
, Louvain algorithm with multilevel refinement
and SLM algorithm
0.8
.TRUE
.‘Find Markers’ tab can be used to identify and visualize the marker genes using on of the provided visualization methods. The tab offers identification of markers between two selected phenotype groups or between all groups and can be decided at the time of the computation. Furthermore, markers that are conserved between two phenotype groups can also be identified. Visualizations such as Ridge Plot, Violin Plot, Feature Plot and Heatmap Plot can be used to visualize the individual marker genes.
1. Select if you want to identify marker genes against all groups in a biological variable or between two pre-defined groups. Additionally, users can select the last option to identify the marker genes that are conserved between two groups. 2. Select phenotype variable that contains the grouping information. 3. Select test used for marker genes identification. 4. Select if only positive markers should be returned. 5. Press “Find Markers” button to run marker identification.
6. Identified marker genes are populated in the table. 7. Filters can be applied on the table.
8. Filters allow different comparisons based on the type of the column of the table.
9. Table re-populated after applying filters. 10. Heatmap plot can be visualized for all genes populated in the table (9) against all biological groups in the selected phenotype variable.
11. To visualize each individual marker gene through gene plots, they can be selected by clicking on the relevant rows of the table.
12. Selected marker genes from the table are plotted with gene plots.