What is PHRED Scores
A Phred score is a measure of the probability that a base call in a DNA sequencing read is incorrect. It is a logarithmic scale, meaning that a small change in the Phred score represents a large change in the probability of an error.
\(Q = -10 \cdot \log_{10}(P)\)
Where:
-
Q is the PHRED score.
-
P is the probability that the base was called incorrectly.
For example:
-
Q = 20: This corresponds to a 1 in 100 probability of an incorrect base call, or an accuracy of 99%.
-
Q = 30: This corresponds to a 1 in 1000 probability of an incorrect base call, or an accuracy of 99.9%.
-
Q = 40: This corresponds to a 1 in 10,000 probability of an incorrect base call, or an accuracy of 99.99%.
# Print the header
cat(sprintf("%-5s\t\t%-10s\n", "Phred", "Prob of"))
cat(sprintf("%-5s\t\t%-10s\n", "score", "Incorrect call"))
# Loop through Phred scores from 0 to 41
for (phred in 0:41) {
cat(sprintf("%-5d\t\t%0.5f\n", phred, 10^(phred / -10)))
}
Phred Prob of
score Incorrect call
0 1.00000
1 0.79433
2 0.63096
3 0.50119
4 0.39811
5 0.31623
6 0.25119
7 0.19953
8 0.15849
9 0.12589
10 0.10000
11 0.07943
12 0.06310
13 0.05012
14 0.03981
15 0.03162
16 0.02512
17 0.01995
18 0.01585
19 0.01259
20 0.01000
21 0.00794
22 0.00631
23 0.00501
24 0.00398
25 0.00316
26 0.00251
27 0.00200
28 0.00158
29 0.00126
30 0.00100
31 0.00079
32 0.00063
33 0.00050
34 0.00040
35 0.00032
36 0.00025
37 0.00020
38 0.00016
39 0.00013
40 0.00010
41 0.00008
What is ASCII
ASCII (American Standard Code for Information Interchange) is used to represent characters in computers. We can represent Phred scores using ASCII characters. The advantage is that the quality information can be esisly stored in text based FASTQ file.
Not all ASCII characters are printable. The first printable ASCII character is !
and the decimal code for the character for !
is 33.
# Store output in a vector to fit on a slide
output <- c(sprintf("%-8s %-8s", "Character", "ASCII #"))
# Loop through ASCII values from 33 to 89
for (i in 33:89) {
output <- c(output, sprintf("%-8s %-8d", intToUtf8(i), i))
}
# Print the output in a single block (e.g., to fit on a slide)
cat(paste(output, collapse = "\n"))
Character ASCII #
! 33
" 34
# 35
$ 36
% 37
& 38
' 39
( 40
) 41
* 42
+ 43
, 44
- 45
. 46
/ 47
0 48
1 49
2 50
3 51
4 52
5 53
6 54
7 55
8 56
9 57
: 58
; 59
< 60
= 61
> 62
? 63
@ 64
A 65
B 66
C 67
D 68
E 69
F 70
G 71
H 72
I 73
J 74
K 75
L 76
M 77
N 78
O 79
P 80
Q 81
R 82
S 83
T 84
U 85
V 86
W 87
X 88
Y 89
Phred scores in FASTQ file
In a FASTQ file, Phred scores are represented as ASCII characters. These characters are converted back to numeric values (PHRED scores) based on the encoding scheme used:
-
PHRED+33 Encoding (Sanger/Illumina 1.8+):
-
The ASCII character for a quality score Q is calculated as:
ASCII character=chr(Q+33)
-
For example:
- A PHRED score of 30 is encoded as
chr(30 + 33) = chr(63)
, which corresponds to the ASCII character?
.
- A PHRED score of 30 is encoded as
-
-
PHRED+64 Encoding (Illumina 1.3-1.7):
-
The ASCII character for a quality score QQQ is calculated as:
ASCII character=chr(Q+64)
-
For example:
- A PHRED score of 30 is encoded as
chr(30 + 64) = chr(94)
, which corresponds to the ASCII character^
.
- A PHRED score of 30 is encoded as
-
# Print the header
cat(sprintf("%-5s\t\t%-10s\t%-6s\t\t%-10s\n", "Phred", "Prob. of", "ASCII", "ASCII"))
cat(sprintf("%-5s\t\t%-10s\t%-6s\t%-10s\n", "score", "Error", "Phred+33", "Phred+64"))
# Loop through Phred scores from 0 to 41
for (phred in 0:41) {
# Calculate the probability of error
prob_error <- 10^(phred / -10)
# Convert Phred scores to ASCII characters
ascii_phred33 <- intToUtf8(phred + 33)
ascii_phred64 <- intToUtf8(phred + 64)
# Print the results in a formatted table
cat(sprintf("%-5d\t\t%0.5f\t\t%-6s\t\t%-10s\n",
phred, prob_error,
ascii_phred33, ascii_phred64))
}
Phred Prob. of ASCII ASCII
score Error Phred+33 Phred+64
0 1.00000 ! @
1 0.79433 " A
2 0.63096 # B
3 0.50119 $ C
4 0.39811 % D
5 0.31623 & E
6 0.25119 ' F
7 0.19953 ( G
8 0.15849 ) H
9 0.12589 * I
10 0.10000 + J
11 0.07943 , K
12 0.06310 - L
13 0.05012 . M
14 0.03981 / N
15 0.03162 0 O
16 0.02512 1 P
17 0.01995 2 Q
18 0.01585 3 R
19 0.01259 4 S
20 0.01000 5 T
21 0.00794 6 U
22 0.00631 7 V
23 0.00501 8 W
24 0.00398 9 X
25 0.00316 : Y
26 0.00251 ; Z
27 0.00200 < [
28 0.00158 = \
29 0.00126 > ]
30 0.00100 ? ^
31 0.00079 @ _
32 0.00063 A `
33 0.00050 B a
34 0.00040 C b
35 0.00032 D c
36 0.00025 E d
37 0.00020 F e
38 0.00016 G f
39 0.00013 H g
40 0.00010 I h
41 0.00008 J i