PromPredict
(Web server for promoter identification in genomic DNA sequence )
 
 

 
 
 
Home
About PromPredict
Download
Contact Us
Resources
Group Home

 

Keywords

1.   Average free energy of DNA fragment

The free energy (stability) of a double stranded DNA molecule can be expressed in terms of the free energy of its constituent base paired dinucleotides.Average free energy is determined by the summation of free energy for a sliding window of 15 base pair length over any stretch of DNA sequence.

2.    DNA stability

DNA stability is a sequence dependent property and depends on the sum of interaction energy between the constituent nucleotides.

3.    E

Average free energy over promoter sequences of 100nt length (spanning the region from -80 to +20 with respect to known TSSs).

4.    D

The difference between E and the average free energy (REav) over the downstream (100 to 500 w.r.t TSS) shuffled sequence.
 


 
Introduction

Analysis of various predicted structural properties of promoter regions in prokaryotic as well as eukaryotic genomes indicates that they have several common features, such as lower stability, higher curvature and less bendability, when compared with their neighboring regions. Based on the difference in stability between neighboring upstream and downstream regions in the vicinity of experimentally determined transcription start sites (TSS), a promoter prediction algorithm (PromPredict) has been developed to identify promoter regions in prokaryotic genomic DNA (Kanhere A and Bansal M 2005a, 2005b).

Promoter Prediction Methodology

The average free energy (E) over known promoter sequences and the difference (D) between E and the average free energy over downstream random sequences (REav) are used to search for promoters in the genomic sequences. Difference in free energy or stability of neighboring regions are calculated and compared with the assigned cutoff (obtained from the energy difference between upstream and downstream regions in the vicinity of known TSS), to predict promoters in genomic DNA sequences (Rangannan V and Bansal M 2007, 2009).

Free energy (stability) calculation

The stability of a double stranded DNA molecule can be expressed as sum of free energy of its constituent base paired dinucleotides. In the present study free energy over a long continuous stretch of DNA sequence was calculated by dividing the sequence into overlapping windows of 15 base pairs (or 14 dinucleotide steps). The energy values corresponding to the 16 dinucleotide steps (10 unique dinucleotide) are taken from the unified parameters obtained from melting studies on 108 oligonucleotides (Allawi and SantaLucia 1997; SantaLucia 1998).

Dinucleotide step
Free energy(kcal/mol)
AA
-1.0
TT
-1.0
AT
-0.88
TA
-0.58
CA
-1.45
TG
-1.45
AC
-1.44
GT
-1.44
CT
-1.28
AG
-1.28
GA
-1.30
TC
-1.30
CG
-2.17
GC
-2.24
GG
-1.84
CC
-1.84

The following figure illustrates the variation observed in average free energy (AFE) based on dinucleotide composition in sequences 101nt length and containing repeats of dinucleotides. The average free energy has been calculated for 15-mer fragments and plotted for each sequence.

dinu_stb

Threshold calculation

Promoter sequences of 1001nt length and corresponding to TSSs which are atleast 500nt apart and associated with protein coding genes, from three different bacteria (E. coli, B.subtilis and M. tuberculosis) were categorized on the basis of their GC composition (at 5% GC intervals). The average free energy (E) over the proximal promoter region (spanning -80 to +20 nt with respect to the TSS) and the average free energy (REav) over the shuffled sequences generated from the downstream region (+100 nt to +500 nt with respect to the TSS) of known TSS with defined ranges of GC content was calculated (Rangannan V and Bansal M 2009).

The cut-off values assigned for the AFE values 'E' for a 100 nt long fragment and the difference 'D' between the 'E' and the AFE for the downstream shuffled sequence (REav) from TSS dataset covers seven ranges of GC content (30 to 65 %GC at 5% intervals). These cut-off values have now been updated also to cover the extreame %GC range from 15 to 80% by including data from sequences flanking TLS in 913 microbial genomes (Rangannan V and Bansal M 2010). The TSS-TLS cut-off values used to predict promoter regions in given genome sequence is illustrated in the following figure.

TSS-TLS_cutoff

Window size

Default window size of 100nt is used to calculate E1 and corresponds to high sensitivity as well as precision for identifying promoters. If no promoter signal is identified then 50nt window can be specified to calculate E1.

References

1. Allawi H T and SantaLucia J Jr 1997 Thermodynamics and NMR of internal G.T mismatches in DNA; Biochemistry 36:10581-94. (PDF)
2. Kanhere A and Bansal M 2005a Structural properties of promoters: similarities and differences between prokaryotes and eukaryotes; Nucleic Acids Res. 33:3165-3175. (PDF)
3. Kanhere A and Bansal M 2005b A novel method for prokaryotic promoter prediction based on DNA stability;BMC Bioinformatics 6:1. (PDF)
4.Rangannan V and Bansal M 2007 Identification and annotation of promoter regions in microbial genome sequences on the basis of DNA stability; J. Biosci. 32(5):851-862. (PDF)
5.Rangannan V and Bansal M 2009 Relative stability of DNA as a generic criterion for promoter prediction: whole genome annotation of microbial genomes with varying nucleotide base composition; Mol. BioSyst. 5:p1758 - 1769. (PDF)
6.Rangannan V and Bansal M 2010 High Quality Annotation of Promoter Regions for 913 Bacterial Genomes; Bioinformatics 26(24):p3043 - 3050.
7. SantaLucia J Jr 1998 A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbour thermodynamics; Proc. Natl. Acad. Sci. USA 95:1460-5. (PDF)


Home |About  | Download | Contact | Resources | Group Home

Questions or problems regarding this web site should be directed to [mb@mbu.iisc.ernet.in].
Copyright © 2008 [Molecular Biophysics Unit,IISC]. All rights reserved.

Last modified: 4/27/2010.