DAVID Tool Suite Overview
Introduction
The original version of DAVID introduced a tool suite designed primarily for batch gene annotation and GO term enrichment analysis, enabling researchers to identify the most relevant biological processes associated with a given gene list. While the core enrichment algorithm has remained consistent across all versions, the annotation coverage has significantly expanded. Initially limited to GO terms, DAVID now supports a wide range of annotation categories, including:
-
Gene Ontology (GO) terms
-
Protein–protein interactions
-
Protein functional domains
-
Disease associations
-
Biological pathways (e.g., KEGG, BioCarta)
-
Sequence features
-
Functional summaries
-
Tissue expression
-
Literature references, and more
This expanded coverage allows researchers to explore their gene lists from multiple biological perspectives — all within a single platform. Results can be viewed as individual annotation chart reports or as combined summary reports, offering flexibility depending on the analysis goals.
A notable feature of DAVID is its ability to accept custom gene backgrounds, which is rarely available in other web-based enrichment tools. This option enables more tailored and accurate analyses, particularly when comparing against a relevant experimental or platform-specific background rather than the whole genome.
A Typical Analysis Flow
Load Gene List → View Summary Page → Explore details through Chart Report, Table Report, Clustering Report, etc. → Export and Save Results.
Understanding the Statistical Test in DAVID
In DAVID, Fisher’s Exact Test is used to determine whether a particular biological function or pathway (annotation term) contains more genes from your input list than would be expected by random chance.
Why Fisher’s Exact Test?
We compare two groups:
| Group | Description |
|---|---|
| Your gene list | e.g., DEG list / experimental genes |
| All other genes | the rest of the genome (background) |
And we check whether genes in your list are over-represented in a specific biological category — such as p53 signaling pathway, cell division, etc.
Contingency Table (2 × 2)
Each analysis builds a table like this:
| Your gene list | Background genes | Total | |
|---|---|---|---|
| In pathway | a | c | a + c |
| Not in pathway | b | d | b + d |
| Total | a + b | c + d | n |
The (1,1) cell — “a” — is the number of genes both in your list and in the pathway.
This value is called n₁₁.
How the p-value Is Calculated
Fisher’s Exact Test asks:
Is the number of genes in our list that fall into this pathway (n₁₁) higher than we would expect just by chance?
So it calculates the probability of observing this table and all other tables that are even more extreme (i.e., with higher values of n₁₁).
Mathematically:
Where:
-
A = all possible tables where (1,1) ≥ observed n₁₁
-
Each table’s probability is computed under the null hypothesis of independence.
Meaning:
The probability that the observed overlap (or more) happens just by chance.
Small p-value (< 0.05) → unlikely to happen by chance → strong enrichment!
In Simple Words
DAVID asks:
“Did we find more genes in this pathway than we would expect if we randomly picked genes from the genome?”
- If yes → significant enrichment
- If no → random noise
Hypothetical Example
Consider the Human genome as the background, containing 30,000 genes — this is the Population Total (PT). Among these genes, 40 are known to participate in the p53 signaling pathway, referred to as the Population Hits (PH).
Now assume that, in your experimental gene list of 300 genes — the List Total (LT) — three genes (List Hits, LH) are found to be associated with the p53 signaling pathway.
The question is:
➡️ Is the proportion 3/300 in our list significantly higher than the background proportion of 40/30,000? In other words, is this enrichment more than what would be expected by random chance?
Fisher’s Exact Test is used in DAVID to statistically evaluate whether the observed enrichment of p53-related genes in the list is significant compared with the genomic background.
LH <- 3 # List Hits
LT <- 300 # List Total
PH <- 40 # Population Hits
PT <- 30000 # Population Total
# Contingency table for Fisher's test
table_fisher <- matrix(c(LH,
LT - LH,
PH - LH,
PT - LT - (PH - LH)),
nrow = 2,
byrow = TRUE)
# Fisher's Exact Test (right-sided)
fisher.test(table_fisher, alternative = "greater")
Fisher's Exact Test for Count Data
data: table_fisher
p-value = 0.007443
alternative hypothesis: true odds ratio is greater than 1
95 percent confidence interval:
2.105361 Inf
sample estimates:
odds ratio
8.096268
What Does a Small p-value Mean?
If the p-value is small (e.g., 0.007):
✔ Genes from your list appear in that pathway more often than expected by random chance → significant enrichment
✔ Suggests biological association
✔ That pathway is important for your experiment
If p > 0.05 → there is no strong evidence of enrichment.
What About the EASE Score? (Modified Fisher’s Test)
The EASE Score is a more conservative variant of Fisher’s Exact Test used by DAVID.
It works by subtracting one gene from the List Hits (LH) before computing the p-value.
Why subtract one?
It penalizes weak evidence — especially when enrichment is supported by only one or a few genes.
This helps to avoid false positives and ensures that the association is strong and reliable.
# Adjusted List Hits for EASE Score
LH_ease <- LH - 1
# Contingency table for EASE Score
table_ease <- matrix(c(LH_ease,
LT - LH_ease,
PH - LH_ease,
PT - LT - (PH - LH_ease)),
nrow = 2,
byrow = TRUE)
# EASE Score = Modified Fisher’s Exact Test
fisher.test(table_ease, alternative = "greater")
Fisher's Exact Test for Count Data
data: table_ease
p-value = 0.06063
alternative hypothesis: true odds ratio is greater than 1
95 percent confidence interval:
0.8956195 Inf
sample estimates:
odds ratio
5.238625
Functional Annotation Summary
Functional Annotation Chart Report
Functional Annotation Clustering
Because many annotation terms are biologically related, the Functional Annotation Chart often shows multiple similar annotations repeatedly. This redundancy can make it difficult to focus on the key biological themes.
To address this issue, DAVID provides the Functional Annotation Clustering feature. Instead of listing terms individually, this tool groups similar annotations together, allowing researchers to interpret results more clearly and at a higher biological level than with a traditional chart report.
The clustering method is based on the principle that similar annotation terms tend to share similar gene members. DAVID uses:
Kappa statistics to quantify the similarity between annotation terms by measuring the degree of shared genes.
Fuzzy heuristic clustering (previously used in the Gene Functional Classification Tool) to group annotation terms according to their Kappa values.
In essence, the more genes two annotations share, the more likely they are placed in the same cluster.
Interpretation of Results
-
The p-values for each annotation term within a cluster are identical to those shown in the regular Functional Annotation Chart (e.g., Fisher’s Exact Test / EASE Score).
-
Each annotation cluster is assigned a Group Enrichment Score, calculated as the geometric mean of the member p-values (in –log₁₀ scale).
-
Clusters with higher Enrichment Scores indicate that their member terms consistently have lower p-values, reflecting stronger biological relevance.
Refernce
https://davidbioinformatics.nih.gov/helps/functional_annotation.html