The Evolution of Assortment with Multiple Simultaneous Games

Current theories of social evolution predict the direction of selection for a given level of assortment. What remains unclear is how to determine the direction of selection on assortment itself if this were subject to evolutionary change. Here we define and analyse a simple model that allows us to investigate the evolution of assortment. We find that there is only a positive selection gradient for increased assortment if the population is polymorphic in the cooperative trait. We further show that if the individuals in question engage in multiple cooperative dilemmas simultaneously then there may be a continued selection on increased assortment which is ultimately sufficient to resolve severe dilemmas such as the prisoner’s dilemma.


Introduction
The evolution of cooperation was a problem which Darwin labelled "his one special difficulty".In a naïve interpretation the presence of cooperation (and particularly altruism) presents a fundamental challenge to the Darwinian view of nature.Why would individuals be selected to perform actions which are beneficial to others at a cost to themselves?The two major attempts at answers to this question come in the form of kin selection (Hamilton, 1964;Gardner et al., 2011;Queller, 1985;Maynard Smith, 1964) and group selection (Wilson, 1975;Borrello, 2005).These two processes have been shown to be mathematically equivalent (Lehmann et al., 2007b;Foster, 2006;Queller, 1992) as they both essentially depend on population structure that gives rise to assortment of interactions.Here assortment means that like individuals will interact more often than would be expected from random interactions.Self-interested individuals will cooperate in an assorted population because, by virtue of being a cooperator, they are more likely to receive the benefits of other cooperators.In agreement with a number of authors we see assortment as the key factor in the evolution of cooperation (Eshel and Cavalli-Sforza, 1983;Fletcher and Zwick, 2006;Godfrey-Smith, 2008;Michod and Sanderson, 1985;Sober, 1992).
An outstanding puzzle for the theoretician studying the evolution of cooperation is to investigate the evolution of as-sortment/relatedness.In many instances in nature individuals may have traits which in effect modify which other members of the population they interact with.For instance, dispersal rate (Smaldino and Schank, 2012;Pepper and Smuts, 2002;Lehmann et al., 2007a) may have a genetic component and hence be subject to selection.The vast majority of studies of the evolution of cooperation take such parameters as given; a more complete understanding of the evolution of cooperation would be facilitated by studying, in the most general and therefore most abstract setting possible, how such genetic traits effecting assortment will coevolve with genetic traits determining social behaviours such as cooperation.A small number of recent papers have begun to look at such processes.Notably Powers et al. (2011) study a model in which individuals play a public goods game in a group structured population.In addition to a gene controlling the social strategy (i.e.cooperate or defect) they also look at the concurrent evolution of a gene which determines a group size preference of the individuals.Individuals disperse and join new groups, the sizes of these groups are determined genetically, thus individuals may prefer to be in either larger or smaller groups.Because a population composed of small groups is more highly assorted than a population composed of large groups; this "group size preference" has the effect of an assortment parameter.In such a model it is found that the coevolution of population structure and social strategy leads to a feedback process whereby the evolution of cooperation and the evolution of social structures that support cooperation facilitate each other so that large levels of cooperation are eventually selected for.Jackson and Watson (2015) use the formalism of a metagame to investigate the evolutionary dynamics of game changing behaviour such as assortment.In their model each agent has a genetically determined payoff matrix representing a desired game, as well as a gene determining their social strategy.Unlike conventional game theoretic studies, which simply investigate the dynamics of a given game, this model allows for the underlying game to be altered via the process of natural selection.They find that a strong linkage disequilibrium emerges, whereby cooperators choose a game that is favourable to cooperators and likewise for defectors.Which of these two strategies ultimately wins depends upon the equilibrium properties of the game.If the equilibrium of the initial game is dominated by cooperators then selection will change the game towards the harmony game, thereby further entrenching cooperation.However, if defection dominates at equilibrium then selection will move the game towards a more severe dilemma, and thus further entrench defection.For selection to have any effect on the underlying game there must be some polymorphism in the social strategy, which is not the case in the prisoner's dilemma at equilibrium.
One of the intermediate aims of this paper is to present a model for the evolution of assortment which is more abstract, and hence more general, than any previous model.This is desirable as it allows one to dispose of many arbitrary modelling assumptions.We look at the coevolution of assortment and social strategy in a cooperative dilemma and investigate under what conditions there is a positive selection gradient on increased assortment.In agreement with previous studies we find that such a gradient only exists if there is currently a polymorphic level of cooperation in the population.In the language of two-player games this is a situation represented by a snowdrift game.We find that games, such as the prisoner's dilemma, which have no cooperation at equilibrium, do not result in a selection for increased assortment.Thus, such a mechanism cannot "resolve" a prisoner's dilemma.Note that the prisoner's dilemma game represents the biological scenario of strong altruism, which is often observed in nature (West et al., 2007).
The second contribution of this paper is to show a plausible scenario in which assortment can be increased sufficiently by selection to levels high enough to "solve" the prisoner's dilemma (or in other words levels high enough to observe strong altruism).The principal idea is that individuals will be engaged in multiple social interactions at once, assumed to be controlled by genes at different loci.Thus, any two individuals will engage in a large number of social interactions simultaneously.A simple example may be a species of bacteria that may produce a number of public goods, each of which is simply a protein.Each individual bacterium may or may not produce each public good.Thus, the multiple interactions within the species can be represented via a series of games (rather than conventional studies which consider only one game taking place).Each individual may be a cooperator or a defector in each game independently of whether or not they cooperate or defect in other games.In such an instance it may be the case that one of these games is a snowdrift game and hence provides a positive selection gradient for increased relatedness until a sufficient level of assortment has arisen to fixate cooperation at this locus.As a bi-product of this process other games will become transformed such that they are then polymorphic (i.e.contain a mixture of both cooperators and defectors).Given enough games between individuals there will be a continual selec-tion on increased assortment so that the population ends up highly assorted and therefore cooperation can evolve even for much more severe dilemmas.
Very few authors have looked at the possibility of the outcome of multiple games being played at once or sequentially between agents/individuals.Those that do, do so from within economics or psychology.In particular a number of authors (Bednar and Page, 2007;Bednar et al., 2012;Grimm and Mengel, 2012) have looked at the consequences of multiple, and qualitatively different, games being played in sequence between subjects.The key themes of these papers tend to be to do with cognitive spill-over, i.e. how the outcome of one game might affect another.Otherwise they are to do with the cognitive load on the individual i.e. how individuals might use heuristics or rules learnt in one game to reduce the computation needed to solve other games.To the best of our knowledge no authors have looked at the dynamics of multiple games from within evolutionary game theory.One reason for the lack of such a study is that the basic result, i.e. that each game reaches the ESS independently, is not interesting unless one allows for some manner of epistatic interaction between games.In our model the epistasis comes via the intermediary of the evolving assortment parameter.With this interdependence the presence of multiple games results in a qualitatively different outcome, as we shall show.

Model
We shall restrict our analysis to those social interactions which can be represented via pair-wise interactions.Such interactions can be represented via a two-player game.Santos et al. (2006) show that the space of all possible twoplayer, two-strategy, symmetric games is two dimensional.The payoff matrix for which is given by: In which the reward for mutual cooperation is normalised to one, and the punishment for defection to zero.T parameterises the temptation to defect against a cooperator, and S the sucker's payoff received by cooperating against a defector.
One can neatly handle the effects of assortment via a transformation of the game.A level of assortment, α, is defined as follows: with probability α an individual is paired with another individual with the same strategy as itself, and with probability 1 − α it is paired with a random individual.It can be shown (van Veelen, 2011) that the outcome of a game M under assortment α is equivalent to the outcome of the game M with no assortment, where: (2) Thus one can consider assortment as an effective transformation of the game into a more harmonious one, as figure 1 illustrates.
Agents play many games simultaneously, the strategy in each is determined at a separate locus.It is thus possible for an agent to cooperate in some games, whilst defecting in others.All agents play all games with all other agents.The overall payoff they receive is simply the mean of the payoffs from all games.
There are N G games being played at once.An array of matrices determines all of the games such that the kth game is given by: Each value of S and T is chosen at random from the uniform distribution S ∈ [−1, 1] and T ∈ [0, 2].
In addition to social strategies in multiple games the agents have a gene for a desired level of assortment, α i .With probability α i agent i interacts with a clonally related individual, that is it has the same value of the gene at all loci.With probability 1 − α i it enters a pool of players, and therefore interacts with an agent chosen randomly from the subset of those other individuals who have also chosen to enter the pool.
Selection proceeds in a generational GA using fitness proportionate selection.Mutation may occur at a locus controlling social strategy; with probability µ s each locus changes to its opposite strategy.The assortment gene is mutated with probability µ a , and subsequently changes by an amount drawn from the normal distribution with mean zero and standard deviation 0.05 (the results do not depend sensitively on the particular choice of these parameters).If the value mutates outside of the permitted range it is simply scaled back to zero/one.We assume that the time scale of evolution is slower for assortment than it is for game strategy, so that the game strategy is always in equilibrium and that mutations in α occur gradually.We also assume that the primitive state of the population is freely mixed.We thus begin our simulation with α = 0 for all agents, and allow strategy frequencies to reach equilibrium before 'turning on' a small mutation in α.
In addition to this there is also a small cost to being assorted, k × α i , which increases linearly with the agents assortment (typically k = 1 × 10 −3 unless otherwise stated).This cost is introduced to ensure that all change in assortment is adaptive, rather than being due to drift, when all else is equal then selection will not favour an increase in assortment.This cost is kept small so as not to effect the direction of selection significantly in cases other than drift.

Results
We proceed by investigating special cases of the more general model.Firstly we look at a model in which assortment evolves but there is only one game being played.Secondly, we look at multiple games, but in the absence of assortment.Finally, we analyse the full model with both evolvable assortment and multiple games.

The Evolution of Assortment with a Single Game
We first sketch a mathematical argument, and go on to compare it with a simulation based model.
Let the frequency of cooperators in the pool be p C and likewise for defectors p D .Then the payoff an individual with strategy i gets (as a function of α) is: p C is calculated by taking a mean of the number of cooperators, weighted by the chance that they enter the pool, which is 1 − α i .E C is the expected number of cooperators in the pool: Where s i = 1 for a cooperator and 0 for a defector.The sum runs over all members of the population.Similarly for defectors: It follows that: To investigate this model further one simply needs to determine when individuals with a slightly larger than normal level of desired assortment can invade a population.We consider a population composed of individuals who all have the same value for desired assortment α, and then simply asked whether a mutant with a slightly larger value of desired assortment α + δ α can invade this population.This can be more simply answered by looking at payoff and asking whether or not this is an increasing function of α.The payoff to cooperators is given by: Because S < 1 it follows that (1 − p C − Sp D ) > 0 and thus π i (α) is an increasing function of α, which means that there is always a selection pressure for existing cooperators to increase their values of α i .
What about if the population consists entirely of unassorted defectors?We then wish to know the value of α for which cooperators can invade, i.e.: because we consider cooperators invading in infinitesimal quantities we assume p C = 0 and also p D = 1.Equation ( 12) reduces to: Therefore, in the limit of infinitesimal increase in α, cooperators can only invade if S > 0 (found by setting α = 0 in the above equation); that is α will only increase in a snowdrift game.
The above argument is a sketch of a formal mathematical argument in which a number of simplifying assumption were necessary.We complement this argument with a agent based genetic algorithm, which includes stochastic effects not present in the mathematical model.We run the full simulation model, but set N G = 1.We record the level of mean α and mean cooperation at equilibrium and plot these over the space of all possible games on the TS plane.The results are presented in figure 2. The conclusion of the model is that a positive selection for assortment will only occur if there exists some preliminary level of cooperation at equilibrium.As there is mutation on strategy there is also a small amount of selection pressure to increase assortment even in games which are totally dominated by cooperators (as is the case in the harmony game and some of the stag hunt game).This is to protect against the occasional introduction of defectors into the population.In the snowdrift regions in which the population is almost all cooperate, and also in the region in which S 1 there is little increase in assortment.Although a purely mathematical argument asserts that there will be selection for assortment, stochastic effects and the small cost to assortment counteract this effect in the marginal cases.

Multiple Games in the Absence of Evolvable Assortment
This model is simply the full model minus the possibility of evolvable assortment.Thus, each allele represents the social strategy in an independent game.We find that the equilibrium level for each game is simply equal to the game's ESS.In other words the presence of multiple games does not affect the outcome of selection.This is to be expected as there is no epistasis between the different alleles.Figure 3 shows a typical output of this model.For ease of interpretation we create a scatter plot of the games being played on the TS plane alongside the evolution of strategy frequencies.

The Evolution of Assortment with Multiple Games
Finally we present findings for the full version of the model, in which assortment is subject to selection, and the agents engage in multiple games.We choose seven random points on the TS plane and let the simulation run to equilibrium.
The key finding here is that the presence of multiple games allows for a continuous selective pressure towards increased assortment.Each social strategy gene is not directly affected by other social strategy genes.However, there is epistasis between them, but via the intermediary of the assortment allele.Any game that is polymorphic (i.e. is a snowdrift game) creates a selection pressure on increased assortment.This occurs until the game is transformed into a Harmony game.Other, non-polymorphic games do not give rise to any selection pressure on assortment.However, a certain level of assortment may transform a prisoner's dilemma into a snowdrift game, thus engaging further selection pressure for increased assortment.This process may repeat until a level of assortment has evolved sufficient to resolve all such dilemmas.Whether this occurs depends upon the details of where the various games lie, but the probability of at least one game being polymorphic clearly increases with the number of games being played.Figure 4 shows one example of how this process may work, and one example of when it fails to do so.In the example in which there is no significant increase in assortment the initial distribution of games does not include a snowdrift game, whereas in the other example the initial distribution does include a snowdrift game, thus eventually providing enough assortment to resolve the prisoner's dilemma.This particular instance of the model uses a population size of 1028 and µ a = µ T = 5 × 10 −3 and is run for 1600 generations.

Discussion
It is known in the field of evolutionary biology that population structure is important when considering the evolution of social traits (Eshel and Cavalli-Sforza, 1983).Specifically positive assortment can allow for the evolution of cooperative or altruistic behaviour.Formalisms such as kin selection (Hamilton, 1964;Grafen, 1982;Maynard Smith, 1964) and multi-level selection (Price, 1970;Okasha, 2009) allow one to make precise calculations of the expected change in the frequency of a cooperative allele due to selection (see for example Gardner et al. (2011)).Furthermore, the natural world is full of examples of cooperative behaviour, such as the cooperation between cellular slime moulds (Strassmann et al., 2011), eusocial insects (Hölldobler and Wilson, 1990) or the cells of a multicellular organism (Michod and Roze, 2001;Buss, 1987;Queller, 2000).In many cases the population structure of such organisms may have a genetic component and thus be subject to evolutionary change.For example, trees may alter the dispersal radius of their seeds (Pepper and Smuts, 2002), birds may alter the time at which they leave their natal group (Bulmer, 1994), social wasps may alter the number of eggs they lay in one host (Ode and Strand, 1995).In addition, multicellular organisms undergo a unicellular bottleneck which may be an adaptation that increases the genetic homogeneity of the organism (Ryan and Watson, 2015).Whilst it is at least plausible that many of these feature are adaptations there lacks a unified theoretical understanding of the conditions under which an increases positive assortment will evolve.
Recently a number of authors (Powers et al., 2011;Jackson and Watson, 2015) have begun to address this issue.One point that has emerged from these studies, and is backed by a formal mathematical argument here, is that selection will not increase positive assortment unless the underlying game is polymorphic.Games such as the prisoner's dilemma are not polymorphic at equilibrium, and therefore selection on assortment cannot "get started".Furthermore, these games represent the biologically prevalent case of strong altruism (see for example Doncaster et al. (2013)).We have presented a potential resolution to this problem: if individuals interact in many social dilemmas simultaneously then there may exist a feedback process whereby the weaker dilemmas transform the stronger dilemmas into weaker ones and eventually all dilemmas are resolved.
Biologically this could represent, for example, the evolutionary path towards multicellularity (see also Michod and Roze (2001); Maynard Smith and Szathmary (1997); Ispolatov et al. (2012); Grosberg and Strathmann (2007); Jablonka and Lamb (2006)).The cells of a proto-organism will eventually need to cooperate in many different manners, such as by producing different public goods with differing costs or by refraining from different forms of selfish reproduction, each of which with a different benefit.Our model shows how this can be thought of as a continuous process, rather than a binary one (see Godfrey-Smith (2009)).The likelihood of this happening is greatly increased as the number of dilemmas being played increases.It seems plausible that the case of individuals playing one single social dilemma is an idealisation, and that in reality individuals, having many genes, and many potential social interactions will usually be engaged in a very large number of social interactions at once, thus making the transition to social living possible.

Figure 1 :
Figure 1: The space of all two-player symmetric cooperative dilemmas and the effects of assortment.Each point in the space represents a two player game parameterised via S and T , the colour represents the equilibrium level of cooperation reached from an initial condition of one half cooperators (one represents cooperate and zero defect).Each arrow represents the effective transformation of the game under assortment of α = 0.07.Assortment has the effect of transforming the game towards the Harmony (top left) region of game space, thus making it more cooperative.

Figure 2 :
Figure 2: Equilibrium levels of cooperation (top) and α (bottom) over the space of all possible games.Only snowdrift games provide a positive selective gradient on α.

Figure 3 :
Figure 3: Many random games being played.We record the frequency of the population that plays cooperate at each allele.This matches predictions for the ESS of each game independently (dotted lines).

Figure 4 :
Figure 4: The dynamics of a simulation with 7 randomly chosen games.In each panel: top left: change in the mean value of α over time.Bottom left: change in allele frequency for each game locus.Right: position of games in ST space, dotted lines represent the effective transformation of the game due to increasing α.