Downstream analysis - trajectory analysis using Monocle3

Why single cell trajectory analysis

The development of cells in multicellular organisms is a tightly regulated process that unfolds through a series of lineage decisions and differentiation events. These processes result in a diverse array of specialized cell types, each with distinct functional roles. Understanding the dynamics of cell development is crucial for elucidating fundamental biological mechanisms, and single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool for studying these processes at an unprecedented resolution.

Trajectory analysis, combined with pseudotime reconstruction, provides a framework for investigating the temporal and developmental progression of cells within a dataset. Using computational tools like Monocle3, researchers can infer cell trajectories based on the high-dimensional expression profiles of genes, identifying how undifferentiated cells transition through intermediate states toward terminal fates.

Trajectory Reconstruction:

Tools like Monocle3 organize single cells into trajectories by arranging them along a developmental axis, reflecting the continuum of cellular states.

These trajectories are inferred without prior knowledge of time or lineage markers, making them especially powerful for systems where developmental pathways are not fully mapped.

Pseudotime Analysis:

Pseudotime represents an inferred temporal ordering of cells along a trajectory.

It enables the identification of genes and pathways that are dynamically regulated as cells transition through developmental states.

Developmental process of the cell types in C. elegans

The developmental process of the cell types involves a progression through various stages, starting from neuroblasts (progenitor cells) and leading to differentiated neuron types, as described below:

Neuroblasts (Progenitors):

The process begins with neuroblasts, which are multipotent progenitor cells. Examples include: * Neuroblast_ADF_AWB: Precursor to the ADF and AWB neurons. * Neuroblast_AFD_RMD: Precursor to the AFD and RMD neurons. * Neuroblast_ASE_ASJ_AUA: Precursor to the ASE, ASJ, and AUA neurons. * Neuroblast_ASG_AWA: Precursor to the ASG and AWA neurons.

Parent Cells:

Some intermediate stages are represented by parent cell types, such as: * ADL_parent: Gives rise to ADL neurons. * ASI_parent: Gives rise to ASI neurons. * ASE_parent: Gives rise to ASEL and ASER neurons. * ASK_parent: Gives rise to ASK neurons.

Differentiated Neurons:

Fully differentiated neurons emerge from the parent cells and neuroblasts, including:

ADF (Amphid Dorsal Left)
ADL (Amphid Dorsal Left)
AFD (Amphid Fan-shaped Dorsal)
ASE (Amphid Sensory neurons), with subtypes ASEL (Left) and ASER (Right).
ASG (Amphid Sensory neurons Group)
ASH (Amphid Sensory neurons Hypodermal)
ASI (Amphid Sensory neurons Inner)
ASJ (Amphid Sensory neurons Junction)
ASK (Amphid Sensory neurons King)
AWA (Amphid Wing-shaped neurons A)
AWB (Amphid Wing-shaped neurons B)
AWC (Amphid Wing-shaped neurons C), with subtype AWC_ON.
AUA (Amphid Unpaired A neuron)

The neuroblasts differentiate into parent cell types or directly into specific neuron subtypes. Parent cells serve as intermediate stages, further dividing or maturing into various functional neuron types. This hierarchical process ensures the development of diverse neuronal subtypes specialized for distinct sensory and functional roles.

R package installation

Installation of the required packages may take more than 1.5 hours.

Same as other lab notes, a .tar.gz file includes all the library files will be downloaded and uncompressed for preparing the R environment.

#if (!requireNamespace("BiocManager", quietly = TRUE))
#install.packages("BiocManager")
#BiocManager::install(version = "3.20")

## required by scater package
system("apt-get install libx11-dev libcairo2-dev") #, intern = TRUE)

#BiocManager::install(c('BiocGenerics', 'DelayedArray', 'DelayedMatrixStats',
#                       'limma', 'lme4', 'S4Vectors', 'SingleCellExperiment',
#                       'SummarizedExperiment', 'batchelor', 'HDF5Array',
#                       'terra', 'ggrastr'))

#install.packages("devtools")
#devtools::install_github('cole-trapnell-lab/monocle3')

#system("tar zcvf R_lib_monocle3.tar.gz /usr/local/lib/R/site-library")

Configure the environment using existing library files

# https://drive.google.com/file/d/1wCqb1oCfxeplWR7jf3vkPWDavVsGAQzZ/view?usp=sharing
system("gdown 1wCqb1oCfxeplWR7jf3vkPWDavVsGAQzZ")

system("md5sum R_lib_monocle3.tar.gz", intern = TRUE)

'74998728fb9870f0e3e728c7c6449532 R_lib_monocle3.tar.gz'

system("tar zxvf R_lib_monocle3.tar.gz")

.libPaths(c("/content/usr/local/lib/R/site-library", .libPaths()))

.libPaths()

'/content/usr/local/lib/R/site-library'
'/usr/local/lib/R/site-library'
'/usr/lib/R/site-library'
'/usr/lib/R/library'

Load required R packages

library(monocle3)
library(dplyr)

Loading required package: Biobase

Loading required package: BiocGenerics


Attaching package: ‘BiocGenerics’


The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs


The following objects are masked from ‘package:base’:

    anyDuplicated, aperm, append, as.data.frame, basename, cbind,
    colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
    get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
    Position, rank, rbind, Reduce, rownames, sapply, saveRDS, setdiff,
    table, tapply, union, unique, unsplit, which.max, which.min


Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.


Loading required package: SingleCellExperiment

Loading required package: SummarizedExperiment

Loading required package: MatrixGenerics

Loading required package: matrixStats


Attaching package: ‘matrixStats’


The following objects are masked from ‘package:Biobase’:

    anyMissing, rowMedians



Attaching package: ‘MatrixGenerics’


The following objects are masked from ‘package:matrixStats’:

    colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
    colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
    colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
    colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
    colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
    colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
    colWeightedMeans, colWeightedMedians, colWeightedSds,
    colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
    rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
    rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
    rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
    rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
    rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
    rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
    rowWeightedSds, rowWeightedVars


The following object is masked from ‘package:Biobase’:

    rowMedians


Loading required package: GenomicRanges

Loading required package: stats4

Loading required package: S4Vectors


Attaching package: ‘S4Vectors’


The following object is masked from ‘package:utils’:

    findMatches


The following objects are masked from ‘package:base’:

    expand.grid, I, unname


Loading required package: IRanges

Loading required package: GenomeInfoDb


Attaching package: ‘monocle3’


The following objects are masked from ‘package:Biobase’:

    exprs, fData, fData<-, pData, pData<-



Attaching package: ‘dplyr’


The following objects are masked from ‘package:GenomicRanges’:

    intersect, setdiff, union


The following object is masked from ‘package:GenomeInfoDb’:

    intersect


The following objects are masked from ‘package:IRanges’:

    collapse, desc, intersect, setdiff, slice, union


The following objects are masked from ‘package:S4Vectors’:

    first, intersect, rename, setdiff, setequal, union


The following object is masked from ‘package:matrixStats’:

    count


The following object is masked from ‘package:Biobase’:

    combine


The following objects are masked from ‘package:BiocGenerics’:

    combine, intersect, setdiff, union


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

When we were working on the scRNA-seq data in C. elegans, we didn't perform cell type annotation because of limited time.

expression_matrix <- readRDS(url("https://depts.washington.edu:/trapnell-lab/software/monocle3/celegans/data/packer_embryo_expression.rds"))
cell_metadata <- readRDS(url("https://depts.washington.edu:/trapnell-lab/software/monocle3/celegans/data/packer_embryo_colData.rds"))
gene_annotation <- readRDS(url("https://depts.washington.edu:/trapnell-lab/software/monocle3/celegans/data/packer_embryo_rowData.rds"))

cds <- new_cell_data_set(expression_matrix,
                         cell_metadata = cell_metadata,
                         gene_metadata = gene_annotation)

cds

class: cell_data_set 
dim: 20222 6188 
metadata(1): cds_version
assays(1): counts
rownames(20222): WBGene00010957 WBGene00010958 ... WBGene00021594
  WBGene00007064
rowData names(3): id gene_short_name num_cells_expressed
colnames(6188): AAACCTGCAAGACGTG-300.1.1 AAACCTGGTGTGAATA-300.1.1 ...
  TGCGGGTAGTACTTGC-b02 TTTGTCAAGTACACCT-b02
colData names(19): cell n.umi ... bg.b01.loading bg.b02.loading
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):

class(expression_matrix)
dim(expression_matrix)

'dgCMatrix'

20222
6188

head(cell_metadata, 4)

A data.frame: 4 × 19
	cell	n.umi	time.point	batch	Size_Factor	raw.embryo.time	embryo.time	embryo.time.bin	raw.embryo.time.bin	lineage	num_genes_expressed	cell.type	bg.300.loading	bg.400.loading	bg.500.1.loading	bg.500.2.loading	bg.r17.loading	bg.b01.loading	bg.b02.loading
	<chr>	<dbl>	<fct>	<fct>	<dbl>	<int>	<dbl>	<fct>	<fct>	<chr>	<int>	<chr>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>
AAACCTGCAAGACGTG-300.1.1	AAACCTGCAAGACGTG-300.1.1	1003	300_minutes	Waterston_300_minutes	0.7795692	350	350	330-390	330-390	ABalpppapav/ABpraaaapav	646	AFD	0.808794	0.2324676	-2.000489	-2.425965	-0.5436492	-2.2848042	-2.1302609
AAACCTGGTGTGAATA-300.1.1	AAACCTGGTGTGAATA-300.1.1	1458	300_minutes	Waterston_300_minutes	1.1332123	190	190	170-210	170-210	ABalppppa/ABpraaapa	857	NA	9.220938	3.9429037	-3.420859	-3.479376	4.8987977	1.6406862	0.1534805
AAACCTGTCGGCCGAT-300.1.1	AAACCTGTCGGCCGAT-300.1.1	1633	300_minutes	Waterston_300_minutes	1.2692289	260	245	210-270	210-270	ABpxpaaaaa	865	NA	6.008029	2.2257000	-3.630310	-3.828569	1.9894965	-0.1370570	-0.5189810
AAAGATGGTTCGTTGA-300.1.1	AAAGATGGTTCGTTGA-300.1.1	1716	300_minutes	Waterston_300_minutes	1.3337396	220	225	210-270	210-270	NA	873	NA	7.518360	3.0385123	-3.932011	-4.290579	1.9108642	-0.9612141	-2.2660029

If you work on your own dataset, the meta data information will be created/collected by yourself. In this notebook, the metadata has been created/collected by the authors. In Seurat object, metadata information is stored in @metadata dataframe.

Column cell.type: Cell type annotation performed by the authors in the Science paper

Column embryo.time.bin: For each cell, the age of the embryo that it came from was estimated by correlating its transcriptome with a bulk RNA-seq time series. The bulk RNA-seq time series are from the paper here.

table(cell_metadata$cell.type)

                   ADF                ADF_AWB                    ADL 
                   170                    102                    477 
            ADL_parent                    AFD                    ASE 
                   148                    326                    205 
            ASE_parent                   ASEL                   ASER 
                   149                     38                     39 
                   ASG                ASG_AWA                    ASH 
                   173                     99                    345 
                   ASI             ASI_parent                    ASJ 
                   212                    187                    320 
                   ASK             ASK_parent                    AUA 
                   233                    150                     98 
                   AWA                    AWB                    AWC 
                   236                    212                    309 
                AWC_ON     Neuroblast_ADF_AWB     Neuroblast_AFD_RMD 
                     9                    131                    147 
Neuroblast_ASE_ASJ_AUA     Neuroblast_ASG_AWA     Neuroblast_ASJ_AUA 
                   103                    142                    123

head(gene_annotation)

A data.frame: 6 × 3
	id	gene_short_name	num_cells_expressed
	<chr>	<chr>	<int>
WBGene00010957	WBGene00010957	nduo-6	6038
WBGene00010958	WBGene00010958	ndfl-4	1597
WBGene00010959	WBGene00010959	nduo-1	5342
WBGene00010960	WBGene00010960	atp-6	5921
WBGene00010961	WBGene00010961	nduo-2	2686
WBGene00000829	WBGene00000829	ctb-1	5079

Pre-process the data

Most analyses (including trajectory inference, and clustering) in Monocle3, require various normalization and preprocessing steps. preprocess_cds executes and stores these preprocessing steps.

Specifically, depending on the options selected, preprocess_cds first normalizes the data by log and size factor to address depth differences, or by size factor only. Next, preprocess_cds calculates a lower dimensional space that will be used as the input for further dimensionality reduction like tSNE and UMAP.

In monocle3, cds is short for cell_data_set (CDS) object.

cds <- preprocess_cds(cds, num_dim = 50)

Data sets that contain cells from different groups often benefit from alignment to subtract differences between them. Alignment can be used to remove batch effects, subtract the effects of treatments, or even potentially compare across species. align_cds executes alignment and stores these adjusted coordinates.

This function can be used to subtract both continuous and discrete batch effects.

⚠️ Important

Your own dataset might not include the batch-loading information demonstrated here. You will need to address batch effects using the batch information specific to your data. For instance, if your data were collected at different locations, the location data could be used as a factor to correct for batch effects.

cds <- align_cds(cds, alignment_group = "batch",
           residual_model_formula_str = "~ bg.300.loading + bg.400.loading +
           bg.500.1.loading + bg.500.2.loading + bg.r17.loading +
           bg.b01.loading + bg.b02.loading")

Aligning cells from different batches using Batchelor.
Please remember to cite:
     Haghverdi L, Lun ATL, Morgan MD, Marioni JC (2018). 'Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors.' Nat. Biotechnol., 36(5), 421-427. doi: 10.1038/nbt.4091

Reduce dimensionality and visualize the results

cds <- reduce_dimension(cds)

No preprocess_method specified, and aligned coordinates have been computed previously. Using preprocess_method = 'Aligned'

plot_cells(cds, label_groups_by_cluster=FALSE,  color_cells_by = "cell.type")

No trajectory to plot. Has learn_graph() been called yet?

Warning message:
“[1m[22mRemoved 1 row containing missing values or values outside the scale range
(`geom_text_repel()`).”

png

You can use plot_cells() to visualize the variation of individual genes along the trajectory. Let's examine a few genes that exhibit intriguing expression patterns in ciliated neurons:

ciliated_genes <- c("che-1",
                    "hlh-17",
                    "nhr-6",
                    "dmd-6",
                    "ceh-36",
                    "ham-1")

plot_cells(cds,
           genes=ciliated_genes,
           label_cell_groups=FALSE,
           show_trajectory_graph=FALSE)

png

The C. elegans che-1 gene encodes a zinc finger transcription factor required for specification of the ASE chemosensory neurons.

Cluster your cells

This function takes a cell_data_set as input, clusters the cells using Louvain or Leiden community detection, and returns a cell_data_set with internally stored cluster assignments. In addition to clustering, the function calculates partitions, representing superclusters of the Louvain or Leiden communities, identified through a kNN pruning method. Cluster assignments can be accessed via the clusters function, and partition assignments can be accessed via the partitions function.

cds <- cluster_cells(cds)

plot_cells(cds, color_cells_by = "partition")

No trajectory to plot. Has learn_graph() been called yet?

png

Learn the trajectory graph

Monocle3 aims to learn how cells transition through a biological program of gene expression changes in an experiment.

Each cell can be viewed as a point in a high-dimensional space, where each dimension describes the expression of a different gene.

Identifying the program of gene expression changes is equivalent to learning a trajectory that the cells follow through this space.

However, the more dimensions there are in the analysis, the harder the trajectory is to learn.

Fortunately, many genes typically co-vary with one another, and so the dimensionality of the data can be reduced with a wide variety of different algorithms.

Monocle3 provides two different algorithms for dimensionality reduction via reduce_dimension (UMAP and tSNE). Both take a cell_data_set object and a number of dimensions allowed for the reduced space.

You can also provide a model formula indicating some variables (e.g. batch ID or other technical factors) to "subtract" from the data so it doesn't contribute to the trajectory. The function learn_graph is the fourth step in the trajectory building process after preprocess_cds, reduce_dimension, and cluster_cells. After learn_graph, order_cells is typically called.

Principal graph

Monocle uses reverse graph embedding (RGE) to map the cells to a lower-dimensional latent space, i.e. each cell \(\boldsymbol{x}_i, i=1, \ldots, N\) has a corresponding latent point \(\boldsymbol{z}_i\). These latent points are clustered in a way similar to k-means by iteratively fitting of a small set of centroids, \(\boldsymbol{y}_k, k=1, \ldots, K(K \leq N)\). The principal graph is then built on these centroids. Finally the latent points are mapped on the nearest point on this qraph to obtain their pseudotimes

cds <- learn_graph(cds)
plot_cells(cds,
           color_cells_by = "cell.type",
           label_groups_by_cluster=FALSE,
           label_leaves=FALSE,
           label_branch_points=FALSE)

  |======================================================================| 100%
  |======================================================================| 100%


Warning message:
“[1m[22mRemoved 1 row containing missing values or values outside the scale range
(`geom_text_repel()`).”

png

plot_cells(cds,
           color_cells_by = "embryo.time.bin",
           label_cell_groups=FALSE,
           label_leaves=TRUE,
           label_branch_points=TRUE,
           graph_label_size=1.5)

png

principal_graph(cds)

List of length 1
names(1): UMAP

p_graph <- principal_graph(cds)[["UMAP"]]
igraph::V(p_graph) # V(): graph -> vertices

+ 343/343 vertices, named, from 446a6a2:
  [1] Y_1   Y_2   Y_3   Y_4   Y_5   Y_6   Y_7   Y_8   Y_9   Y_10  Y_11  Y_12 
 [13] Y_13  Y_14  Y_15  Y_16  Y_17  Y_18  Y_19  Y_20  Y_21  Y_22  Y_23  Y_24 
 [25] Y_25  Y_26  Y_27  Y_28  Y_29  Y_30  Y_31  Y_32  Y_33  Y_34  Y_35  Y_36 
 [37] Y_37  Y_38  Y_39  Y_40  Y_41  Y_42  Y_43  Y_44  Y_45  Y_46  Y_47  Y_48 
 [49] Y_49  Y_50  Y_51  Y_52  Y_53  Y_54  Y_55  Y_56  Y_57  Y_58  Y_59  Y_60 
 [61] Y_61  Y_62  Y_63  Y_64  Y_65  Y_66  Y_67  Y_68  Y_69  Y_70  Y_71  Y_72 
 [73] Y_73  Y_74  Y_75  Y_76  Y_77  Y_78  Y_79  Y_80  Y_81  Y_82  Y_83  Y_84 
 [85] Y_85  Y_86  Y_87  Y_88  Y_89  Y_90  Y_91  Y_92  Y_93  Y_94  Y_95  Y_96 
 [97] Y_97  Y_98  Y_99  Y_100 Y_101 Y_102 Y_103 Y_104 Y_105 Y_106 Y_107 Y_108
[109] Y_109 Y_110 Y_111 Y_112 Y_113 Y_114 Y_115 Y_116 Y_117 Y_118 Y_119 Y_120
+ ... omitted several vertices

plot_cells(cds,
           color_cells_by = "embryo.time.bin",
           label_cell_groups=FALSE,
           label_leaves=TRUE,
           label_branch_points=TRUE,
           graph_label_size=1.5)

png

plot_cells(cds,
          color_cells_by = "embryo.time.bin",
          label_cell_groups=FALSE,
          label_groups_by_cluster=FALSE,
          label_leaves=FALSE,
          label_branch_points=TRUE,
          label_principal_points = TRUE,       # set this to TRUE
          graph_label_size=3)

png

Order cells

Assigns cells a pseudotime value based on their projection on the principal graph learned in the learn_graph function and the position of chosen root states.

This function takes as input a cell_data_set and returns it with pseudotime information stored internally. order_cells() optionally takes "root" state(s) in the form of cell or principal graph node IDs, which you can use to specify the start of the trajectory. If you don't provide a root state, an plot will be generated where you can choose the root state(s) interactively.

# a helper function to identify the root principal points:
get_earliest_principal_node <- function(cds, time_bin="130-170"){
  cell_ids <- which(colData(cds)[, "embryo.time.bin"] == time_bin)
  # vertex is also called node in a graph
  closest_vertex <-
  cds@principal_graph_aux[["UMAP"]]$pr_graph_cell_proj_closest_vertex
  closest_vertex <- as.matrix(closest_vertex[colnames(cds), ])
  root_pr_nodes <-
  igraph::V(principal_graph(cds)[["UMAP"]])$name[as.numeric(names
  (which.max(table(closest_vertex[cell_ids,]))))]

  root_pr_nodes
}

The function above, get_earliest_principal_node, helps find the "starting point" in a path that cells follow as they change or develop, based on some time-related information.

get_earliest_principal_node(cds)

'Y_62'

cds <- order_cells(cds, root_pr_nodes = get_earliest_principal_node(cds))

plot_cells(cds,
           color_cells_by = "pseudotime",
           label_cell_groups=FALSE,
           label_leaves=FALSE,
           label_branch_points=FALSE,
           graph_label_size=1.5)

png

Finding genes that change as a function of pseudotime Identifying the genes that change as cells progress along a trajectory is a core objective of this type of analysis. Knowing the order in which genes go on and off can inform new models of development.

Let's return to the embryo data, which we processed using the commands

You can use graph_test() to find genes that are differentially expressed on the different path through the trajectory. The parameter, neighbor_graph="principal_graph", tells graph_test() to test whether cells at similar positions on the trajectory have correlated expression:

ciliated_cds_pr_test_res <- graph_test(cds, neighbor_graph="principal_graph", cores=4)
pr_deg_ids <- row.names(subset(ciliated_cds_pr_test_res, q_value < 0.05))

  |===========================================================================| 100%, Elapsed 10:03

pr_deg_ids[1:10]

'WBGene00010957'
'WBGene00010958'
'WBGene00010959'
'WBGene00010960'
'WBGene00010961'
'WBGene00000829'
'WBGene00010962'
'WBGene00010963'
'WBGene00010964'
'WBGene00010965'

Here are a couple of interesting genes that score as highly significant according to graph_test():

plot_cells(cds, genes=c("hlh-4", "gcy-8", "dac-1", "oig-8"),
           show_trajectory_graph=FALSE,
           label_cell_groups=FALSE,
           label_leaves=FALSE)

png

We can then collect the trajectory-variable genes into modules:

gene_module_df <- find_gene_modules(cds[pr_deg_ids,], resolution=c(10^seq(-6,-1)))

dim(gene_module_df)
head(gene_module_df)

8065
5

A tibble: 6 × 5
id	module	supermodule	dim_1	dim_2
<chr>	<fct>	<fct>	<dbl>	<dbl>
WBGene00010957	7	1	-4.377187	-1.642148
WBGene00010958	7	1	-4.623665	-1.831054
WBGene00010959	7	1	-4.359261	-1.622316
WBGene00010960	7	1	-4.383083	-1.655990
WBGene00010961	7	1	-4.393384	-1.639716
WBGene00000829	7	1	-4.366506	-1.660363

cell_group_df <- tibble::tibble(cell=row.names(colData(cds)),
                                cell_group=colData(cds)$cell.type)
agg_mat <- aggregate_gene_expression(cds, gene_module_df, cell_group_df)
row.names(agg_mat) <- stringr::str_c("Module ", row.names(agg_mat))
pheatmap::pheatmap(agg_mat,
                   scale="column", clustering_method="ward.D2")

png

gene_module_df <- find_gene_modules(cds[pr_deg_ids,], resolution=c(10^seq(-6,-1)))

You can also use plot_cells() on gene_module_df:

plot_cells(cds,
           genes=gene_module_df %>% dplyr::filter(module %in% c(27, 10, 7, 30)),
           label_cell_groups=FALSE,
           show_trajectory_graph=FALSE)

png

Monocle offers another plotting function that can sometimes give a clearer view of a gene's dynamics along a single path. You can select a path with choose_cells() or by subsetting the cell data set by cluster, cell type, or other annotation that's restricted to the path. Let's pick one such path, the AFD cells:

# Error: `choose_cells` only works in interactive mode.
# Not working in Jupyter notebook or colab
# May try this function in R studio or
# use new kernel `xeus-r`
# choose_cells(cds)

AFD_genes <- c("gcy-8", "dac-1", "oig-8")
AFD_lineage_cds <- cds[rowData(cds)$gene_short_name %in% AFD_genes,
                       colData(cds)$cell.type %in% c("AFD")]

AFD_lineage_cds

class: cell_data_set 
dim: 3 326 
metadata(2): cds_version citations
assays(1): counts
rownames(3): WBGene00001535 WBGene00000895 WBGene00020582
rowData names(3): id gene_short_name num_cells_expressed
colnames(326): AAACCTGCAAGACGTG-300.1.1 ACCAGTATCGTAGGTT-300.1.1 ...
  GCTGCGATCTTCTGGC-b02 GGGCACTAGCCTTGAT-b02
colData names(19): cell n.umi ... bg.b01.loading bg.b02.loading
reducedDimNames(3): PCA Aligned UMAP
mainExpName: NULL
altExpNames(0):

The function plot_genes_in_pseudotime() takes a small set of genes and shows you their dynamics as a function of pseudotime:

plot_genes_in_pseudotime(AFD_lineage_cds,
                         min_expr=0.5)

png

As you can see, gene dac-1 is activated before the other two genes.

Reference

https://colab.research.google.com/drive/10fqFG9UVbazqeaZwbzpSAJ3I79tSoSYG#scrollTo=1onUusVNBvAr&line=1&uniqifier=1

https://cole-trapnell-lab.github.io/monocle3/docs/differential/#pseudo-dep

https://github.com/cole-trapnell-lab/monocle3/issues/179#issuecomment-2145687700