Introduction

Whole-genome sequencing of microorganisms is beginning to provide an opportunity for computer-based genetic analysis that will allow us to highlight interesting features present in the genomic sequence; develop hypotheses concerning the functional relevance of those features; and then test these predictions in the laboratory. We are currently finishing up the sequencing and annotation of the whole genome of the bacterium Ureaplasma urealyticum (Uu). This will be the third genome of a mycoplasma species that has been completely sequenced, providing unique opportunities for comparative analyses between these organisms in order to identify important primary sequence features that translate to important phenotypic features and differences between these, and other organisms.

Annotation of the Uu sequence is being performed using a system based upon compiling the results from local BLAST (and other analysis program) runs, and loading this information into a Microsoft SQL Server database that we have developed. The front-end for this database is a World-Wide Web page that directly accesses the SQL Server data, and allows one to develop individualized queries that can be constructed by setting desired options on an html form. This system is not only used to query the similarity data, but is also used in the direct annotation of the sequence. Web forms have been developed that allow us to directly enter coordinate data along with gene identification and similarity data into our annotation database. Access to this data will be available for public queries in the near future through any forms-capable web browser. The home page for this site is http://genome.microbio.uab.edu/.

In the course of annotating the Uu sequence, we noticed an interesting difference between the genes for Uu and the published sequence for Mycoplasma genitalium (Mg). In the region of the genome that codes for a cluster of ribosomal proteins, The intergenic regions between the coding sequences for Mg were all very small, averaging around 1-2 nucleotides, and frequently 0 nucleotides. Uu in comparison had Ig regions that averaged around 20 nucleotides in this region.

This region of the genome contains three regulatory operons, the S10 operon, named for the S10 ribosomal protein gene, the spc operon named for the S5 ribosomal protein gene (site for the antibiotic spectinomycin), and the alpha operon that codes for the RNA polymerase alpha subunit among other proteins. Differences in intergenic region length might suggest different regulatory mechanisms for protein expression within these operons for these different organisms. It might also be possible that Mg, with a genome size of 580 Kbp (Uu is estimated to have a genome size of 760 Kbp) might have compressed its intergenic regions in order to minimize the size of its genome. To further investigate these possibilities, we decided to extend our observations to the genomes of other organisms for which sequence in this region is available. This poster describes the comparative analysis of the genomic sequences of 7 different bacterial organisms in this ribosomal protein cluster, with an emphasis on comparing differences within the intergenic regions.

Table 1. Organisms

  Organism Size Accession Number
Bs: Bacillus subtilis ~4.2 Mbp U43929; D50302; D64127
Ec: Escherichia coli ~4.6 Mbp X02613; X01563; X02543
Hi: Haemophilus influenzae 1.8 Mbp U32761; U32762
Mc: Mycoplasma capricolum ~1.1 Mbp X06414
Mg: Mycoplasma genitalium 580 Kbp L43967
Mp: Mycoplasma pneumoniae 816 Kbp U00089
Uu: Ureaplasma urealyticum ~760 Kbp  

The bacterial organisms used in this study are listed along with their genome (or estimated genome) size, and the accession number(s) of the sequences used for analysis.