Conditions for Outperformance of Recombination in Online Evolution of Swarm Robots

Genetic recombination is commonly used in evolutionary algorithms and yet its benefits are an open question in evolutionary biology. We investigate when recombination is actually beneficial in the evolutionary adaptation of swarm robot behaviour in dynamic environments. In this scenario, artificial evolution has to deal with challenges that are similar to natural evolution: it must run online, distributed and evolve the genome structure. These requirements could diminish the benefit of recombination due to disruptive crossover. Using neural networks as robot controllers, we reduce this disruptiveness with an adaptive mate choice that evolves the probability of recombination and the genetic similarity of mates. In two experiments with a multi-agent simulation, we compare the adaptive performance of this approach with random recombination and pure mutation. Whereas both recombination treatments naturally outperform at low mutation rates, pure mutation achieves its best performance with high rates, where it also outperforms random recombination. The adaptive mate choice, however, achieves the same performance as pure mutation at high rates and outperforms when the network size is increased. We also found that treatments with recombination evolved smaller neural networks with fewer links.


Introduction
It is a major challenge to give artificial, autonomous systems adaptive and problem solving capabilities.One approach to this problem attempts to mimic the impressive results of natural evolution by emulating its mechanisms like mutation, recombination and selection in so-called evolutionary algorithms (Eiben and Smith, 2003).The prevalence of recombination in evolutionary algorithms is interesting, given that is not essential for an evolutionary process and because it is an ongoing discussion in evolutionary biology why sex and recombination is beneficial (Rice, 2002).Sex is considered costly because asexuals can potentially reproduce twice as fast (Maynard Smith, 1978).But sex is predominant in nature, a fact that numerous theories attempt to explain by attributing benefits to sexuality and recombination that compensate for those costs (West et al., 1999).One prominent argument is that sex accelerates adaptation to changing environments (Bell, 1982).In evolutionary algorithms however, recombination is not considered costly because there is no actual reproduction and benefits have been shown in many cases (Eiben and Bäck, 1997;Doerr et al., 2008).
In this work, we investigate recombination in the case of evolution of swarm robot behaviour in dynamic environments.This case has special requirements for the evolutionary algorithm that make it more similar to natural evolution, and we wonder if recombination still exhibits clear benefits.In swarm robotics, many small robots are deployed with limited individual capabilities but the swarm can have emergent capabilities through cooperation (Sahin, 2005).Due to the difficulty of developing cooperative behaviour for swarm robots, evolution is often employed for this purpose (Haasdijk et al., 2010;Bredeche et al., 2010).One major challenge in swarm robotics is to make the swarm fully autonomous and adaptive so it can operate independently in the dynamic, real world -which is similar to what natural organisms do.
The special requirements of this challenge complicate the application of conventional evolutionary approaches, for example Evolution Strategies (Beyer and Schwefel, 2002).First, swarm robotics avoids using a central supervising instance because the robots should operate autonomously with only local information.Many evolutionary algorithms are centralized and not capable for such distributed operation.Second, to be able to deal with a-priori unknown and dynamic environments, constant adaptation is required which necessitates an online evolutionary algorithm (Agogino et al., 2000).Online evolution optimizes a system in parallel while it is deployed in its task.Every candidate evaluation is done in the local, variable conditions and affects total system performance whereas offline evolution can evaluate under repeated, constant conditions and deploy only the best solutions.And last, a large flexibility in evolvable behaviours is needed to deal with unknown environments.We use here neural networks, which are a well established approach for evolving robot behaviour (Floreano and Mondada, 1994).It has been shown that evolving the structure of the neural networks by adding and removing neurons and links increases the flexibility and is advantageous (Stanley and Miikkulainen, 2002).However, structural evolution of neural networks complicates their recombination and can lead to disruptive macro mutations unless network structures are correctly matched, for example in NEAT (Stanley and Miikkulainen, 2002).
There exist evolutionary algorithms that address each of those requirements individually but not simultaneously.We have recently developed an approach to fill this gap for distributed online evolution with structural evolution of neural networks (Schwarzer et al., 2011).With this approach, we investigate here when recombination is beneficial in a simulation experiment with four robots in a foraging task.This is a very low number for a swarm, but it tests the effectiveness of the evolutionary algorithm even with limited resources.The approach can be scaled up and is applicable to real swarm robots because information is exchanged only locally.Our expectation is that recombination increases the speed of adaptation and that in a limited time, an unfit population increases its performance faster than when only mutation is used.
The mutation rate is a crucial parameter of this comparison because it is known that increasing the mutation rate increases the speed of adaptation until it reaches a point where performance is reduced, called the error threshold (Eigen, 1971;Ochoa and Harvey, 1999).We also include different neural network sizes in the comparison because it has been shown that longer genomes can reduce the error threshold (Ochoa, 2006).

Material and Methods
The experimental setup is an extension of our previously published study (Schwarzer et al., 2011).Compared to the earlier work, the genome structure and evolutionary algorithm have been expanded to improve the performance of recombination.

Genome and Neural Network
The genome encodes a recurrent neural network with a variable number of hidden neurons.Any connection between hidden neurons is possible.The employed neuron model uses the weighted sum of inputs with bias and sigmoid activation function.
In Figure 1, an overview of the genome structure is shown which differs from our earlier approach.The genome is now diploid with two homologous chromosomes.Each chromosome is an array of gene sites is used, each gene site an be free or occupied with a gene.This arrangement makes it possible to mutate the chromosome by relocating genes to free spots without shifting the absolute positions of other genes.In this way, crossover is simplified because the majority of genes in related chromosomes always line up at the same position.There are two different gene types: node gene and link gene.A node gene contains a node ID and a bias value for the activation function; it produces a node Figure 1: The genome has two chromosomes.A chromosome is an array of gene sites which can be occupied or empty.Genes come in two types, node genes and link genes.
A node gene creates one node in the neural network, a link gene creates one link.
with the respective ID and parameters.A link gene contains two node IDs and a link weight; if there are node genes for both IDs, a neural link is created between them with the given weight.When new nodes are generated during mutation, they are assigned a random identifier.The identifier is used to recognize if two genes on the same gene site share a common ancestry.
A value can be calculated between two chromosomes that describes the genetic similarity s from 0 (completely different) to 1 (identical) in a similar way to the Sørensen Index (Sørensen, 1948).It is computed by comparing each homologous gene site, counting similar sites c and number of sites of both genomes (n a , n b ) according to the following formula: Genes are similar when they use the same node IDs.For the similarity between two genomes, the chromosomes are paired and the pairwise chromosome similarity values are averaged.
The genes of both chromosomes are used to create the neural network.Thus, it occurs regularly that that are multiple genes encoding the same structural element.This is normally the case for genes on homologous sites but also possible with genes across different sites due to relocation and recombination.These genes can have different parameters (weights, bias) for the same structure similar to different alleles in biology.In such cases, a dominance value is computed for the duplicate genes based on their parameter values.Dominant and recessive parameter values are evenly and finely distributed across the range of possible values.Only the allele with the highest dominance value is used and if multiple, different allele have the same dominance, their parameters are averaged.
Outside of this chromosome structure, the genome contains three evolvable parameters.One mutation factor that is multiplied to the standard deviation of parameter mutation operations and to the probability of structural mutations.A mutation factor with a value of 1 leads to the default, unaltered mutation rate; values larger than 1 increase and smaller than 1 decrease it.The two other parameters are used for the adaptive mating mechanism explained in more detail in the description of the evolutionary algorithm.
Mutation of the genome is done on a per-gene-basis.Every gene is subjected to parameter mutation where weight and bias are mutated using a normal distribution with σ = 0.01.With a probability of 0.05, a gene undergoes structural mutation: a node gene changes its node ID to a new random ID and a link gene changes its source or destination ID to a different one that is already present on the genome.Furthermore, the gene may be completely removed (p=0.01),duplicated with the copy being moved to a random position on the chromosome (p=0.01)or relocated to a random, free site on the chromosome (p=0.02).
An offspring genome is created similar to meiosis.First, a mutated copy of the whole genome is created for each parent.Then the chromosomes in each copy are crossed over with a probability of 0.02 per gene site.One resulting chromosome is picked from each parent and subsequently fused to form a diploid offspring genome.

Evolutionary Algorithm
One instance of the evolutionary algorithm runs on each simulated robot and maintains an island population of ten genomes that serve as parental genome pool.In one cycle of online evolution, one offspring genome is generated from this genome pool and used to create a neural network.The neural network controls the robot for 2,500 simulation ticks (about 50 seconds in real time) during which it is evaluated based on the expressed behaviour.At the end of this time, the offspring genome may survive and replace one member of the island population, or it may be discarded, depending on its evaluation.Genomes also migrate between island populations independently of the evaluation cycle.When two robots are in close proximity of each other, one robot transmits and removes one random genome from its population; the recipient responds by sending a random genome back.This exchange occurs at most once every 10,000 ticks per robot.There are alternative approaches for this migration process which are possibly more effective, for example by only transmitting the best genome, but the given approach is sufficient for the purpose of this work in creating one virtual large population.The operation of this evolutionary algorithm is illustrated in Figure 2.
In order to create an offspring, one parent genome of the island population is selected randomly.We call this genome female here, but note that the genomes are equivalent to  (Schwarzer et al., 2011).An island population is an instance of the distributed algorithm that runs on each robot.It maintains a constant number of genomes that serve as parental genome pool from which offspring genomes are created for evaluation.Genomes are occasionally exchanged between island populations.hermaphrodites; each genome can have offspring with any other genome within the island population.The female can select a mate for recombination, depending on the employed mate selection strategy.If mate selection decides to not pick a mate, a mutated copy of the female is generated as offspring, which is always the outcome in the treatment without recombination.With random mating, mate selection picks a random genome from the island population (excluding itself).The adaptive mate selection strategy uses evolvable parameters of the female to influence the selection of the mate.First, the genetic similarity between the female and all other genomes of the local population is calculated.These similarity values are evaluated with a Gaussian function, with µ and σ given by the female genome, to result in a value of "attractiveness" for each mate between 0 and 1.In other words, genomes have an evolvable value for the ideal genetic similarity of mates, and they can also evolve how much they are willing to deviate from this ideal.One mate is picked from the candidates with a random roulette selection, using the attractiveness as weights so candidates whose genetic similarity is closer to the ideal value have a larger likelihood of being chosen.However, it is also possible that no mate is picked when the sum of attractiveness of all candidates is less than 1.This can occur on purpose when the genome evolves, for example, a narrow range of acceptable values; it has then effectively reduced the probability of recombination.
In the survivor selection, which is identical in all three treatments, the freshly evaluated candidate genome is compared to the genomes in the island population and a special metric, which we call "fitness score" is used.The fitness score is the average of the last six offspring evaluations of a genome.A genome in the island population that did not have six offspring yet, uses its own evaluation to seed this moving average.The candidate genome must have an evaluation greater than the worst fitness score in the population to survive and replace this worst genome.This mechanism makes the evolutionary algorithm tolerant to changes in the environment.A genome that once evaluated well will produce unfit offspring in a different environment, its fitness score drops as a result of low offspring evaluations and it can be more easily surpassed.

Scenario and Simulation Environment
We use a foraging scenario that was designed to be practical and realizable in existing hardware.It requires search, identification and harvesting of power sources; an essential task of an autonomous swarm.The particular challenge for the neural network evolution is the processing of visual, colour information to distinguish resources from surroundings.
The experiment is done in the same 2D multi-agent simulation of our previous study (Schwarzer et al., 2011) and uses a similar scenario: four robots are in an arena with ten power stations that can be harvested by being next to them (see Fig. 3 for a visualisation of the arena).The stations charge only slowly and the best strategy is to continuously search the arena for nondepleted power stations.
The robots are modelled after a small, agile swarm robot like the e-puck (Mondada et al., 2009) or Wanda (Kettler et al., 2010) with a differential drive, distance sensors and a visual sensor that could be derived from an on-board camera.This visual sensor provides the robot with an RGB colour signal from three sectors, each 20 • wide, covering together an area of 60 • in front of the robot.One proximity sensors is also present in each of the three sectors.
All objects in the arena have a colour appearance.In order to present a challenge that requires more complicated neural processing, the power stations are blinking by alternating their appearance between black and an active colour every five ticks.Depleted power stations do not blink and stay black until they have recharged some energy.A changing environment is simulated by changing the active colour of all power stations.
The current evaluation of a robot is increased every time it is next to an undepleted power station; at the same time, the current charge of that station is decreased.The time needed to fully drain a station is variable: the maximum time is 1,000 ticks, less time is needed when the robot decelerates.In this way, a robot can harvest from multiple stations within its 2,500 ticks of evaluation time, and the fastest way to gain reward is to come to a full halt next to a power station that is fully charged.Since a rarely visited power station is likely fully charged, exploration and exploitation of all power stations is promoted.

Experimental Setup
The main experiment of this work is a two-stage scenario that we refer to as the change experiment: a randomly generated population is first evolved in one type of environment for a certain time, then the environment changes and the adaptation in the second stage is measured.Compared to using only a single stage, our preliminary runs have shown that this reduces the bias from choosing how the initial population is constructed and variance in the results.The two environments are created by changing the appearance of the power stations.In the first stage, the power stations blink in red, a unique colour signal in the arena and thus an easy challenge.After two million ticks, the colour changes to blue, which is the same colour as other robots and difficult to distinguish because the temporal change of the blinking signal has to be detected.We have shown with the same neuron model that the second stage requires recurrent connections and hidden neurons whereas top performance is possible in the first stage without hidden neurons (Schwarzer et al., 2011).The experiment continues for 4 million ticks in the second stage for a total of 6 million ticks.
In a second experiment, we test if different speeds of environmental change have an effect.The environment is After the colour change at 2 million ticks, the performance drops sharply but the system re-adapts and performance recovers.
gradually changed by continuously altering the colour of the power stations by increasing the hue in HSV colour space at maximal saturation and brightness, starting with plain red.This leads to a continuous, repeating cycle through fully saturated colours.By changing the time for a full colour cycle, the speed of environmental change can be varied.The experiment lasts for 6 million ticks and we refer to this experiment as the colour cycle experiment.
The initial population for both experiments is generated with random genomes that encode a fully connected network with hidden neurons.All link weights are uniformrandomly initialized with values between ±0.2.One chromosome contains all necessary genes for these structures, it has twice as many gene sites as genes and they are placed on random sites.In addition to the genes for the connected network, six disconnected node genes are added so that treatments with zero initial neurons can still evolve them.In order to obtain a diploid genome, a mutated homologue copy of the first chromosome is added.
The mutation rate is controlled by the mutation factor on the genome.We let this mutation factor evolve slowly but also run trials with a fixed value.The number of initial neurons is varied, which affects the degree of structural evolution needed and the number of initial genes.Since the initial neurons are fully connected, the genome size increases quadratically with more neurons.The colour cycle experiment varies the time needed for a full colour cycle.See Table 1 for an overview of all experimental factors.
The response variable of both experiments, called (collection) performance, is the mean of all evaluations of all four robots in a time frame of 500,000 ticks.It is a fairly long time, encompassing a total of 800 individual evaluations, in order to reduce variance.The theoretical maximum performance is 40, limited by the recharge rate of the power stations.Fifty replicates were done in all factor combinations.

Results
Recombination naturally outperformed in the lower mutation factors of 0.031 and 0.125 but at these levels the absolute performance was also reduced.At higher mutation levels, the differences between pure mutation and recombination are small.The effect of the error threshold comes into play only with higher initial network sizes, and thus large genomes.With 0 initial neurons (32 initial genes per chromosome), performance is still high even with mutation factor of 32 whereas with 14 neurons (458 genes) some treatments have reduced performance at factors 8 and higher (shown in Figure 5).Evolving the mutation rate reduced this pattern only slightly.We use in the following result presentation the treatment with 14 initial neurons as a baseline because it shows the biggest differences.The adaptive mate choice generally settled at a low recombination rate of 14.7%(s = 16.8%).
The asterisk in the graphs indicate significance levels (p ≤ 0.05: *; p ≤ 0.01: **; p ≤ 0.001: ***) of a two treatment comparison using Wilcoxon signed rank test with n = 50.Outliers in the box and whisker plots are more than 1.5 interquartile ranges outside of the box.The data was analysed using R Version 2.15.2 (R Development Core Team, 2011).The effect of various mutation rates can be seen with an optimal rate between 1 and 8.In this configuration, recombination outperforms also at the elevated mutation factor of 8.

Environmental Change Experiment
The two stages of the change experiment can be seen in Figure 4 where the development of performance over time is shown in the treatment with adaptive mutation rate and 14 initial neurons.After two million ticks, the performance drops sharply when the power stations switch their blinking colour from red to blue.The swarm re-adapts to the new situation and performance rises.A comparison of collection performance at the end of the change experiment with 14 initial neurons, adaptive mate choice and adaptive mutation rate across mutation rates is shown in Figure 5.This is the only configuration where recombination significantly outperformed in mutation factors of 1 and higher (p ≈ 0.114 at factor 1, p ≈ 0.004 at factor 8).
This peculiarity is illustrated in Figure 6 where the relative performance of recombination versus pure mutation is shown across alternative configurations.(A) repeats the boxplots from Figure 5 at a mutation factor of 8, the other shown configurations keep this mutation factor and change one other experimental factor.In (B), random mating is shown instead of the adaptive mate choice.Random mating is actually performing better at low mutation rates, but at higher rates the results drop off to significantly lower levels than no recombination (p ≈ 0.019).The adaptive mutation rate is disabled in (C), the general performance drops and the significant difference between the two treatments disappears as well.In (D), 0 initial start neurons are used and end performance reaches peak levels but no significant difference are present.Thus, the advantage of recombination at high mutation levels is only seen with large networks, adaptive mutation rate and adaptive mate choice.However, the adaptive mate choice also never performed worse than without any recombination.Whereas random recombination suffers at high mutation rates and pure mutation suffers at low rates, the adaptive mate choice always reaches top performance.
We found an unexpected result when looking at the sizes of evolved neural networks, shown in Figure 7. Recombination treatments evolved generally smaller networks with fewer neural links.We find strongly significant differences for most of the parameter space we investigated, except in the most extreme mutation rates.

Colour Cycle Experiment
The different rates of change in the colour cycle experiment have an inconclusive effect.End performance is similar across cycle times despite the large range of values covered.This can be seen in Figure 8 where the results with 14 initial neurons, initial mutation factor of 8, adaptive mutation rate and adaptive mate selection is shown.At the lowest colour cycle times of 100.000 ticks, the appearance of the power stations changes from one primary colour to the next within 14 evaluations but the system tolerates this rapid change well.
We find significant stronger performance of recombination at cycle time of 250.000 (p ≈ 0.008), 1.000.000(p ≈ 0.031) and 2.000.000ticks (p ≈ 0.031) but not at 100.000 and 500.000ticks.As before in higher mutation rates, random recombination is significantly worse than the other two treatments.The resulting network sizes are also significantly lower here for recombination treatments, shown in Figure 9.

Discussion
At the optimal mutation rate for the given scenario, pure mutation achieves similar performance as recombination.Random recombination virtually increases the effect of muta- tion: compared to pure mutation, it has good performance at lower rates but suffers from increased mutation rates sooner.This agrees with the findings of Ochoa and Harvey (1999) where recombination reduced the error threshold.The reduced performance at higher mutation rates with larger initial networks is also similar to the results in Ochoa (2006) where the error threshold was lowered with increased genome length.
The adaptive mate choice mechanism, however, has always a strong performance and is robust to different mutation rates.It significantly outperforms in high mutation rates when the mutation rate itself can evolve and when using larger neural networks.This could be related to the effect of the error threshold as originally described in evolutionary biology: as genomes become larger, maximal mutation rate has to be reduced (Eigen, 1971).Our results indicate that recombination might be able to compensate for the reduced maximal mutation rate with selective mating strategies.And it is likely that better results are possible with more sophisticated mating selection.
Finally, the effect that recombination produces smaller networks is interesting since there are no direct costs for network size in the system.Also this seems unrelated to the effect of genome size as mentioned before because it is much stronger than the differences in performance.Also only the neural network is reduced; the number of genes stay roughly the same but the number of junk genes is increased.Large networks likely exacerbate the problem of network crossover, which is biggest for the random recombination treatment that also produces the smallest networks.Although the network size had no effect on performance here, smaller networks could easily be considered an advantage in terms of lower computational costs.There is little effect of the different rates of colour change.
Adaptive mate choice sometimes outperforms no recombination.

Conclusion
In our simulated experiment about the online evolution of neural networks for swarm robots, we found that the difference in performance between pure mutation and recombination depends largely on the mutation rate and the size of the neural network.With the optimal mutation rate, omitting recombination achieved similar performance, but outside of this rate, recombination did outperform, in particular with large networks and adaptive mate selection.Since it can be difficult to estimate the optimal mutation rate before deploying a system, the robustness to suboptimal mutation rates can be an advantage.Because recombination is not generally outperforming, our results contrast the conventional practice of generally including recombination in evolutionary algorithms and it might be worthwhile in some situations to omit it.The results are also an indication that the discussion in evolutionary biology about the benefits of sex is not completely unrelated to evolutionary computation.While recombination in evolutionary algorithms does not have the same costs of reproduction like sex in biology, it does suffer from genetic disruption due to breaking of favourable gene combinations, in particular with random recombination.We have shown that a mate selection strategy can mitigate this effect to improve recombination and outperform pure mutation even at elevated mutation rates.This advantage could become even bigger when our artificial organisms require larger, more complex genomes to handle more diverse environments.

Figure 2 :
Figure2: Operation of the distributed online evolutionary algorithm(Schwarzer et al., 2011).An island population is an instance of the distributed algorithm that runs on each robot.It maintains a constant number of genomes that serve as parental genome pool from which offspring genomes are created for evaluation.Genomes are occasionally exchanged between island populations.

Figure 3 :
Figure 3: Screenshot of the experimental arena.The four simulated robots are shown with the approximate coverage of their visual sensors.Ten power stations are present that show their state by their colour: blinking red represents "charged", black "depleted" or "charging".

Figure 4 :
Figure 4: Development of collection performance of the change experiment.After the colour change at 2 million ticks, the performance drops sharply but the system re-adapts and performance recovers.

Figure 5 :
Figure5: End performance of the change experiment with adaptive mate choice, adaptive mutation rate and 14 initial neurons.The effect of various mutation rates can be seen with an optimal rate between 1 and 8.In this configuration, recombination outperforms also at the elevated mutation factor of 8.

Figure 6 :
Figure 6: Comparison of the end performance of the change experiment with an elevated initial mutation factor of 8 in other configurations: (A) Same as Figure 5. (B) Random mating instead of adaptive mate choice.(C) Fixed mutation rate instead of adaptive rate.(D) 0 initial neurons instead of 14.

Figure 8 :
Figure 8: End performance of the colour cycle experiment.There is little effect of the different rates of colour change.Adaptive mate choice sometimes outperforms no recombination.

Figure 9 :
Figure 9: End link count of the colour cycle experiment.Random recombination produces the smallest networks, followed by adaptive mate choice and then no recombination.

Table 1 :
Factors of the experimental setup.Colour cycle time is only used in the colour cycle experiment.