corner
corner

Phys. Rev. Lett. 94, 178103 (2005) [4 pages]

Divergence and Shannon Information in Genomes

Download: PDF (127 kB) Buy this article Export: BibTeX or EndNote (RIS)

Hong-Da Chen1, Chang-Heng Chang1, Li-Ching Hsieh4, and Hoong-Chien Lee1,2,3
1Department of Physics, National Central University, Chungli, Taiwan 320, Republic of China
2Department of Life Science, National Central University, Chungli, Taiwan 320, Republic of China
3Center for Complex Systems, National Central University, Chungli, Taiwan 320, Republic of China
4Institute of Information Science and Genomics Research Center, Academia Sinica, Taipei, Taiwan 115, Republic of China

Received 23 September 2004; published 5 May 2005

Shannon information (SI) and its special case, divergence, are defined for a DNA sequence in terms of probabilities of chemical words in the sequence and are computed for a set of complete genomes highly diverse in length and composition. We find the following: SI (but not divergence) is inversely proportional to sequence length for a random sequence but is length independent for genomes; the genomic SI is always greater and, for shorter words and longer sequences, hundreds to thousands times greater than the SI in a random sequence whose length and composition match those of the genome; genomic SIs appear to have word-length dependent universal values. The universality is inferred to be an evolution footprint of a universal mode for genome growth.

© 2005 The American Physical Society

URL:
http://link.aps.org/doi/10.1103/PhysRevLett.94.178103
DOI:
10.1103/PhysRevLett.94.178103
PACS:
87.10.+e, 02.50.−r, 87.14.Gg, 87.23.Kg