Initial analysis of 10x scRNA-seq data for C. elegans using cellranger

Data availability

Data can be obtained from the link below: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE126954

GSM3618670 UW synchronized 300 min post bleach
GSM3618671 UW synchronized 400 min post bleach
GSM3618672 UW synchronized 500 min post bleach batch 1
GSM3618673 UW synchronized 500 min post bleach batch 2
GSM3618674 UPenn mixed embryo batch r17
GSM3618675 UPenn mixed embryo batch b01
GSM3618676 UPenn mixed embryo batch b02

In this class, you will only work on: 300 min post bleach, 400 min post bleach and 500 min post bleach batch 1

An copy of the data has been stored here: /scratch/zt1/project/bioi611/shared/raw_data/10x_cele_data/scRNA/

If you want to download and prepare the files yourself,

Here is the process:

For each sample, fetch the sra files using prefectch

For example:

export PATH=/scratch/zt1/project/bioi611/shared/software/sratoolkit.3.1.1-centos_linux64/bin:$PATH
prefetch SRR8611967

Convert sra file to fastq files

fasterq-dump  --outdir <output_folder>  --include-technical --split-files <sra_file>

Prepare the genome

You don't need to run this step. The content in this part is to show you if you want to prepare the reference genome for cellranger, how you can prepare.

## /scratch/zt1/project/bioi611/shared/reference/cellranger_mkref/
$ cat  /scratch/zt1/project/bioi611/shared/reference/cellranger_mkref/scRNA_cellranger_mkref.sub
#!/bin/bash
#SBATCH --partition=standard
#SBATCH -t 40:00:00
#SBATCH -n 1
#SBATCH -c 26
#SBATCH --mem=250g
#SBATCH --job-name=scRNA_cellranger_mkref
#SBATCH --mail-type=FAIL,BEGIN,END
#SBATCH --error=%x-%J-%u.err
#SBATCH --output=%x-%J-%u.out
export PATH=/scratch/zt1/project/bioi611/shared/software/cellranger-8.0.1/bin:$PATH
cellranger mkgtf ../Caenorhabditis_elegans.WBcel235.111.gtf \
                 Caenorhabditis_elegans.WBcel235.111.filtered.gtf \
                 --attribute=gene_biotype:protein_coding > scRNA_cellranger_mkref.filter_gtf.log 2>&1
cellranger mkref --genome=Caenorhabditis_elegans_genome \
                 --fasta=../Caenorhabditis_elegans.WBcel235.dna.toplevel.fa \
                 --genes=Caenorhabditis_elegans.WBcel235.111.filtered.gtf \
                 > scRNA_cellranger_mkref.log 2>&1

Run `cellranger count`

sbatch /scratch/zt1/project/bioi611/shared/scripts/scRNA_10x_cele_cellranger_count.Uwsync_300min.sub
sbatch /scratch/zt1/project/bioi611/shared/scripts/scRNA_10x_cele_cellranger_count.Uwsync_400min.sub
sbatch /scratch/zt1/project/bioi611/shared/scripts/scRNA_10x_cele_cellranger_count.Uwsync_500min.sub

Aggregate the `cellranger count` results

Many experiments generate data for multiple samples. Depending on the experimental design, these samples may represent replicates from the same set of cells, cells from different tissues or time points from the same individual, or cells from different individuals. These samples could be processed through various Gel Bead-in Emulsion (GEM) wells wells or multiplexed within the same GEM well on Chromium instruments.

To work with data from multiple GEM wells, you can aggregate and analyze the outputs from multiple runs of each of these pipelines using cellranger aggr.

sbatch /scratch/zt1/project/bioi611/shared/scripts/scRNA_10x_cele_cellranger_aggr.sub

Data availability

Prepare the genome

Run cellranger count

Aggregate the cellranger count results

Run `cellranger count`

Aggregate the `cellranger count` results