Activate Activate Activate
contact  
Hello. Sign in to personalize your visit. New user? Register now.  

In
By author
By keywords
Evolutionary Computation

Quarterly (Spring, Summer, Fall, Winter)
141 pp. per issue
7 x 10
Founded: 1993
ISSN 1063-6560

E-ISSN 1530-9304
2008 ISI Impact Factor: 3.000

Evolutionary Computation

Winter 2004, Vol. 12, No. 4, Pages 495-515
Posted Online March 13, 2006.
(doi:10.1162/1063656043138923)
© 2004 Massachusetts Institute of Technology
Measures of Diversity for Populations and Distances Between Individuals with Highly Reorganizable Genomes

Claudio Mattiussi

Institute of Systems Engineering Swiss Federal Institute of Technology of Lausanne (EPFL) 1015 Lausanne, Switzerland,

Markus Waibel

Institute of Systems Engineering Swiss Federal Institute of Technology of Lausanne (EPFL) 1015 Lausanne, Switzerland,

Dario Floreano

Institute of Systems Engineering Swiss Federal Institute of Technology of Lausanne (EPFL) 1015 Lausanne, Switzerland,

PDF (644.209 KB) PDF Plus (648.744 KB)

In this paper we address the problem of defining a measure of diversity for a population of individuals whose genome can be subjected to major reorganizations during the evolutionary process. To this end, we introduce a measure of diversity for populations of strings of variable length defined on a finite alphabet, and from this measure we derive a semi-metric distance between pairs of strings. The definitions are based on counting the number of substrings of the strings, considered first separately and then collectively. This approach is related to the concept of linguistic complexity, whose definition we generalize from single strings to populations. Using the substring count approach we also define a new kind of Tanimoto distance between strings. We show how to extend the approach to representations that are not based on strings and, in particular, to the tree-based representations used in the field of genetic programming. We describe how suffix trees can allow these measures and distances to be implemented with a computational cost that is linear in both space and time relative to the length of the strings and the size of the population. The definitions were devised to assess the diversity of populations having genomes of variable length and variable structure during evolutionary computation runs, but applications in quantitative genomics, proteomics, and pattern recognition can be also envisaged.

Technology Partner - Atypon Systems, Inc.
  CrossRef member COUNTER member