The Ureaplasma urealyticum Genome project

John I. Glass1, Elliot J. Lefkowitz1, Jennifer S. Glass1, Theresa Ngyuen2, Kathryn L. Hunkapiller2, Cheryl R. Heiner2, Ellson Y. Chen2, and Gail H. Cassell1.
1UAB Microbiology,
2Advanced Center for Genetic Technology, PE-Applied Biosystems

A poster presented at the American Society for Microbiology Meeting on
E. coli and Small Genomes
Snowbird, Utah
October 12 - October 15, 1997

Table of Contents

Ureaplasma Genome Project Statistics

Genome Size 751,723 base pairs
G+C% 25.5%
Sequencing Reactions ~13,000
Average Redundancy 7.8
Man Months Labor 29 months
Duration of Cloning & Sequencing Work 24 months
Cost per Finished Base Pair 43¢
Project Costs ~$320,000

Abstract

We sequenced the genome of the bacterium Ureaplasma urealyticum serovar 3 (Uu). Uu is an opportunistic pathogen of the human urogenital tract that is a significant cause of adverse pregnancy outcome. Uu is the third Mycoplasma genome to be completely sequenced. The DNA sequencing was done using a combination of random shotgun sequencing and ordered shotgun sequencing. One of the aims of the project was to demonstrate that two scientists could rapidly sequence an entire microbial genome in a cost effective manner. The sequencing was completed at a cost of ˜43¢/finished base. Approximately 13,000 sequences were performed, and the average sequence redundancy was 7.8. At the completion of the sequencing phase of the project the final two gaps were closed by sequencing directly from genomic DNA. The sequences were assembled using the PE-Applied Biosystems AutoAssembler software.

Annotation of the Uu genome is proceeding utilizing a combination of sequence analysis on a UNIX workstation, along with a data management system developed in-house that is based upon a Microsoft SQL Server database with a Web-browser interface. A set of potential peptides coded for by Uu along with the complete nucleic acid sequence were subject to a variety of BLAST database searches along with a number of other analyses such as tRNAscan for identification of tRNAs. All of the data from these analysis tools were then parsed appropriately, and imported into tables within the SQL database. The web interface allows access to all of this data, and provides a framework for generating a growing map based upon identification of significant coding sequences. While this system is not automated to the extent provided by other available annotation systems, it provides us with an easily configurable system that can be personalized and maintained within a small laboratory environment.

The single circular chromosome of Uu contains 751,723 base pairs. Only 25.5% of the bases are G or C. Uu contains 3370 reading frames coding for 50 or more amino acids. Ureaplasmas use an unusual genetic code in which TGA codes for tryptophan instead of translation termination. Although all 62 codons for amino acids are represented in the Uu genome, the genome codes for potentially only 30 different tRNAs. There are two rRNA operons.