Sequence Symbols
GCG uses the letter codes for amino acid codes and nucleotide ambiguity proposed by IUB (Nomenclature Committee, 1985, Eur. J. Biochem. 150; 1-5). These codes are compatible with the codes used by the EMBL, GenBank, and PIR data libraries.
Nucleotides
The meaning of each symbol, its complement, and the Cambridge (Staden) equivalents are shown below.
IUB/GCG Meaning Complement Staden/Sanger
A A T A
C C G C
G G C G
T/U T A T
M A or C K 5
R A or G Y R
W A or T W 7
S C or G S 8
Y C or T R Y
K G or T M 6
V A or C or G B not supported
H A or C or T D not supported
D A or G or T H not supported
B C or G or T V not supported
X/N G or A or T or C X -/X
(Gap Character) . not G or A or T or C . not supported
Amino Acids
Here is a list of the standard one-letter amino acid codes and their three-letter equivalents. The synonymous codons and their depiction in the IUB codes are shown. You should recognize that the codons following semicolons (;) are not sufficiently specific to define a single amino acid even though they represent the best possible backtranslation into the IUB codes!
IUB Symbol 3-letter Meaning Codons Depiction
A Ala Alanine GCT,GCC,GCA,GCG !GCX
B Asp,Asn Aspartic,
Asparagine GAT,GAC,AAT,AAC !RAY
C Cys Cysteine TGT,TGC !TGY
D Asp Aspartic GAT,GAC !GAY
E Glu Glutamic GAA,GAG !GAR
F Phe Phenylalanine TTT,TTC !TTY
G Gly Glycine GGT,GGC,GGA,GGG !GGX
H His Histidine CAT,CAC !CAY
I Ile Isoleucine ATT,ATC,ATA !ATH
K Lys Lysine AAA,AAG !AAR
L Leu Leucine TTG,TTA,CTT,CTC,CTA,CTG !TTR,CTX,YTR;YTX
M Met Methionine ATG !ATG
N Asn Asparagine AAT,AAC !AAY
P Pro Proline CCT,CCC,CCA,CCG !CCX
Q Gln Glutamine CAA,CAG !CAR
R Arg Arginine CGT,CGC,CGA,CGG,AGA,AGG !CGX,AGR,MGR;MGX
S Ser Serine TCT,TCC,TCA,TCG,AGT,AGC !TCX,AGY;WSX
T Thr Threonine ACT,ACC,ACA,ACG !ACX
V Val Valine GTT,GTC,GTA,GTG !GTX
W Trp Tryptophan TGG !TGG
X Xxx Unknown !XXX
Y Tyr Tyrosine TAT, TAC !TAY
Z Glu,Gln Glutamic,
Glutamine GAA,GAG,CAA,CAG !SAR
* End Terminator TAA, TAG, TGA !TAR,TRA;TRR