Open In Colab

Isoform-level differential expression analysis with Ballgown.

Install R package

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("ballgown")
'getOption("repos")' replaces Bioconductor standard repositories, see
'help("repositories", package = "BiocManager")' for details.
Replacement repositories:
    CRAN: https://cran.rstudio.com

Bioconductor version 3.19 (BiocManager 1.30.25), R 4.4.1 (2024-06-14)

Warning message:
“package(s) not installed when version(s) same as or greater than current; use
  `force = TRUE` to re-install: 'ballgown'”
Old packages: 'Matrix'

Understand the folder with example data

library(ballgown)
data_directory = system.file('extdata', package='ballgown') # automatically finds ballgown's installation directory
# examine data_directory:
data_directory
Attaching package: ‘ballgown’


The following object is masked from ‘package:base’:

    structure

'/usr/local/lib/R/site-library/ballgown/extdata'

list.files(data_directory)
  1. 'annot.gtf.gz'
  2. 'hg19_genes_small.gtf.gz'
  3. 'sample01'
  4. 'sample02'
  5. 'sample03'
  6. 'sample04'
  7. 'sample05'
  8. 'sample06'
  9. 'sample07'
  10. 'sample08'
  11. 'sample09'
  12. 'sample10'
  13. 'sample11'
  14. 'sample12'
  15. 'sample13'
  16. 'sample14'
  17. 'sample15'
  18. 'sample16'
  19. 'sample17'
  20. 'sample18'
  21. 'sample19'
  22. 'sample20'
  23. 'tiny.genes.results.gz'
  24. 'tiny.isoforms.results.gz'
  25. 'tiny2.genes.results.gz'
  26. 'tiny2.isoforms.results.gz'
# make the ballgown object:
bg = ballgown(dataDir=data_directory, samplePattern='sample', meas='all')
bg
Mon Oct 21 17:42:05 2024

Mon Oct 21 17:42:05 2024: Reading linking tables

Mon Oct 21 17:42:05 2024: Reading intron data files

Mon Oct 21 17:42:05 2024: Merging intron data

Mon Oct 21 17:42:05 2024: Reading exon data files

Mon Oct 21 17:42:05 2024: Merging exon data

Mon Oct 21 17:42:05 2024: Reading transcript data files

Mon Oct 21 17:42:05 2024: Merging transcript data

Wrapping up the results

Mon Oct 21 17:42:05 2024




ballgown instance with 100 transcripts and 20 samples

Accessing assembly data

A ballgown object has six slots: structure, expr, indexes, dirs, mergedDate, and meas.

Exon, intron, and transcript structures are easily extracted from the main ballgown object:

Structure

structure(bg)$exon

GRanges object with 633 ranges and 2 metadata columns:
        seqnames            ranges strand |        id transcripts
           <Rle>         <IRanges>  <Rle> | <integer> <character>
    [1]       18 24412069-24412331      * |        12          10
    [2]       22 17308271-17308950      + |        55          25
    [3]       22 17309432-17310226      + |        56          25
    [4]       22 18121428-18121652      + |        88          35
    [5]       22 18138428-18138598      + |        89          35
    ...      ...               ...    ... .       ...         ...
  [629]       22 51221929-51222113      - |      3777        1294
  [630]       22 51221319-51221473      - |      3782        1297
  [631]       22 51221929-51222162      - |      3783        1297
  [632]       22 51221929-51222168      - |      3784        1301
  [633]        6 31248149-31248334      * |      3794        1312
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths
structure(bg)$intron

GRanges object with 536 ranges and 2 metadata columns:
        seqnames            ranges strand |        id         transcripts
           <Rle>         <IRanges>  <Rle> | <integer>         <character>
    [1]       22 17308951-17309431      + |        33                  25
    [2]       22 18121653-18138427      + |        57                  35
    [3]       22 18138599-18185008      + |        58                  35
    [4]       22 18185153-18209442      + |        59                  35
    [5]       22 18385514-18387397      - |        72                  41
    ...      ...               ...    ... .       ...                 ...
  [532]       22 51216410-51220615      - |      2750 c(1294, 1297, 1301)
  [533]       22 51220776-51221928      - |      2756                1294
  [534]       22 51220780-51221318      - |      2757                1297
  [535]       22 51221474-51221928      - |      2758                1297
  [536]       22 51220780-51221928      - |      2759                1301
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
structure(bg)$trans

GRangesList object of length 100:
$`10`
GRanges object with 1 range and 2 metadata columns:
      seqnames            ranges strand |        id transcripts
         <Rle>         <IRanges>  <Rle> | <integer> <character>
  [1]       18 24412069-24412331      * |        12          10
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths

$`25`
GRanges object with 2 ranges and 2 metadata columns:
      seqnames            ranges strand |        id transcripts
         <Rle>         <IRanges>  <Rle> | <integer> <character>
  [1]       22 17308271-17308950      + |        55          25
  [2]       22 17309432-17310226      + |        56          25
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths

$`35`
GRanges object with 4 ranges and 2 metadata columns:
      seqnames            ranges strand |        id transcripts
         <Rle>         <IRanges>  <Rle> | <integer> <character>
  [1]       22 18121428-18121652      + |        88          35
  [2]       22 18138428-18138598      + |        89          35
  [3]       22 18185009-18185152      + |        90          35
  [4]       22 18209443-18212080      + |        91          35
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths

...
<97 more elements>

expr

The expr slot is a list that contains tables of expression data for the genomic features. These tables are very similar to the *_data.ctab Tablemaker output files. Ballgown implements the following syntax to access components of the expr slot:

*expr(ballgown_object_name, <EXPRESSION_MEASUREMENT>)

where * is either e for exon, i for intron, t for transcript, or g for gene, and is an expression-measurement column name from the appropriate .ctab file. Gene-level measurements are calculated by aggregating the transcript-level measurements for that gene. All of the following are valid ways to extract expression data from the bg ballgown object:

transcript_fpkm = texpr(bg, 'FPKM')
transcript_cov = texpr(bg, 'cov')
whole_tx_table = texpr(bg, 'all')
exon_mcov = eexpr(bg, 'mcov')
junction_rcount = iexpr(bg)
whole_intron_table = iexpr(bg, 'all')
gene_expression = gexpr(bg)

Indexes

pData(bg) = data.frame(id=sampleNames(bg), group=rep(c(1,0), each=10))

pData(bg)
A data.frame: 20 × 2
idgroup
<chr><dbl>
sample011
sample021
sample031
sample041
sample051
sample061
sample071
sample081
sample091
sample101
sample110
sample120
sample130
sample140
sample150
sample160
sample170
sample180
sample190
sample200

Plotting transcript structures

plotTranscripts(gene='XLOC_000454', gown=bg, samples='sample12',
    meas='FPKM', colorby='transcript',
    main='transcripts from gene XLOC_000454: sample 12, FPKM')

png

It is also possible to plot several samples at once:

plotTranscripts('XLOC_000454', bg,
    samples=c('sample01', 'sample06', 'sample12', 'sample19'),
    meas='FPKM', colorby='transcript')

png


You can also make side-by-side plots comparing mean abundances between groups (here, 0 and 1):

plotMeans('XLOC_000454', bg, groupvar='group', meas='FPKM', colorby='transcript')

png

Differential expression analysis

Ballgown provides a wide selection of simple, fast statistical methods for testing whether transcripts are differentially expressed between experimental conditions or across a continuous covariate (such as time).

stat_results = stattest(bg, feature='transcript',
                   meas='FPKM', covariate='group',
                   getFC=TRUE)

results_transcripts <- data.frame(geneNames = geneNames(bg),
                                  geneIDs = geneIDs(bg),
                                  transcriptNames = transcriptNames(bg),
                                  stat_results)
head(results_transcripts)
A data.frame: 6 × 8
geneNamesgeneIDstranscriptNamesfeatureidfcpvalqval
<chr><chr><chr><chr><chr><dbl><dbl><dbl>
10XLOC_000010TCONS_00000010transcript103.1934990.013815760.10521233
25XLOC_000014TCONS_00000017transcript251.5490930.267736220.79114975
35XLOC_000017TCONS_00000020transcript354.3886260.010850700.08951825
41XLOC_000246TCONS_00000598transcript411.4405190.471080190.90253747
45XLOC_000019TCONS_00000024transcript451.7143400.084029480.48934813
67XLOC_000255TCONS_00000613transcript672.5185240.273173850.79114975
results_transcripts <- results_transcripts[order(results_transcripts$qval), ]
head(results_transcripts, 10)
A data.frame: 10 × 8
geneNamesgeneIDstranscriptNamesfeatureidfcpvalqval
<chr><chr><chr><chr><chr><dbl><dbl><dbl>
1225XLOC_000440TCONS_00001129transcript1225 5.674371221.035753e-050.001025395
980XLOC_000179TCONS_00000452transcript980 6.253449212.514632e-050.001244743
469XLOC_000101TCONS_00000244transcript469 119.299999382.398681e-040.007915648
695XLOC_000354TCONS_00000883transcript695 0.219509593.302059e-040.008172596
1012XLOC_000409TCONS_00001041transcript1012 0.246644341.527175e-030.030238073
123XLOC_000029TCONS_00000059transcript123 0.016033452.097875e-030.034614939
961XLOC_000176TCONS_00000435transcript961 4.960743992.736075e-030.038695918
880XLOC_000531TCONS_00001277transcript880 29.404852363.272859e-030.040501628
1063XLOC_000197TCONS_00000487transcript1063 3.129881874.555313e-030.050108442
35XLOC_000017TCONS_00000020transcript35 4.388625901.085070e-020.089518247
results_transcripts[results_transcripts$geneIDs == "XLOC_000101", ]
A data.frame: 2 × 8
geneNamesgeneIDstranscriptNamesfeatureidfcpvalqval
<chr><chr><chr><chr><chr><dbl><dbl><dbl>
469XLOC_000101TCONS_00000244transcript469119.2999990.00023986810.007915648
477XLOC_000101TCONS_00000252transcript477 3.1281840.45118036320.902537469
plotMeans('XLOC_000101', bg, groupvar='group', meas='FPKM', colorby='transcript')

png

Reference

https://www.bioconductor.org/packages/release/bioc/vignettes/ballgown/inst/doc/ballgown.html