Criticality as It Could Be: organizational invariance as self-organized criticality in embodied agents

This paper outlines a methodological approach for designing adaptive agents driving themselves near points of criticality. Using a synthetic approach we construct a conceptual model that, instead of specifying mechanistic requirements to generate criticality, exploits the maintenance of an organizational structure capable of reproducing critical behavior. Our approach exploits the well-known principle of universality, which classifies critical phenomena inside a few universality classes of systems independently of their specific mechanisms or topologies. In particular, we implement an artificial embodied agent controlled by a neural network maintaining a correlation structure randomly sampled from a lattice Ising model at a critical point. We evaluate the agent in two classical reinforcement learning scenarios: the Mountain Car benchmark and the Acrobot double pendulum, finding that in both cases the neural controller reaches a point of criticality, which coincides with a transition point between two regimes of the agent's behaviour, maximizing the mutual information between neurons and sensorimotor patterns. Finally, we discuss the possible applications of this synthetic approach to the comprehension of deeper principles connected to the pervasive presence of criticality in biological and cognitive systems.


Introduction
Biological systems at a wide range of scales -from protein families to brains -show signatures of criticality.These systems do not operate deep into one or other regime of activity instead of they operate at or near critical points, poised at transitions in their parameter space (Mora and Bialek, 2011).In a nutshell, criticality refers to a distinctive set of properties found at the boundary separating regimes with different dynamics: the transition between an ordered and a disordered phase.Some of these properties include power-law divergences of some quantities described by critical exponents and fractal behaviour (Salinas, 2001).Signatures of criticality have been detected in neural cultures (Schneidman et al., 2006), immune receptor proteins (Mora et al., 2010), the network of genes controlling morphogenesis in fly embryos (Krotov et al., 2014) or lipid membranes (Honerkamp-Smith et al., 2009).As well, indicators of critical behaviour have been observed in the human brain (Chialvo, 2014) and cognitive behavioural patterns (Van Orden et al., 2012).These results hint at general theoretical principles underlying biological self-organization, compelling us to ask what type of mechanisms are driving biological systems at a dauntingly diverse range of levels of organization to operate near critical points of activity.This question is largely unresolved.Firstly, because the connection between experimental descriptive indicators of criticality and models is often fragile (Wagenmakers et al., 2012).On the other hand, models of criticality are generally used at the level of analogy, based on specific mechanistic requirements reproducing criticality in a set of particular cases, providing a myriad of different models but often failing to capture explanations of a few more universal classes and properties.During the last decade a new generation of experiments has attempted to go beyond analogies generating models directly inferred from biological data (Mora and Bialek, 2011).However, the difficulty of inferring general principles from the specific modelled mechanisms is still patent in these models.
In order to progress in the comprehension of critical phenomena, there are some alternatives and suggestions that we could explore with more emphasis.For example, we could try to detect universal mechanisms able to generate critical activity in a wide set of systems and contexts.This approach -a kind of 'Artificial Life route' to self-organized criticality -could promote the development of conceptual models explaining how organisms are driven towards criticality in an abstract level, working as 'proofs of concept' (Barandiaran and Chemero, 2009) and supporting existing and future experimental findings.Specifically, here we focus on the relation of criticality with organizational invariants of the model.By this, we mean that instead of specifying specific mechanistic requirements to generate criticality, we explore the possibility of generating criticality through the conservation of certain invariants in the relations between the components of the system (e.g.correlations between components), i.e. imposing certain patterns in the system's organization.This organizational point of view, in contrast with focusing on intrinsic properties of the components of the system, is supported by the existence of well-known universality classes that provide a unified expression for families of systems operating under criticality (Kadanoff, 2009).Systems under the same universality class, even if they are defined by very different material parameters or physical properties, present the same critical exponents characterizing diverging observables which are defined by the symmetries of the system.
In line with these ideas, the paper is structured as follows.First, we propose a conceptual model based on statistical mechanics to design an artificial agent that maintains organizational invariants in its structure, specifically a distribution of correlations between the components of the system.Operatively, the model is implemented as a 'Boltzmann machine' neural network reproducing a correlation structure randomly sampled from a 2D lattice Ising model at a critical point.Then, we test the model in two classical examples of learning and control: the Mountain Car and the double pendulum.In both examples we find that agents with no free parameters exploit the whole dynamic range of available configurations being poised near critical transition points of their parameter space between qualitatively different behavioural regimes.Finally, we discuss the possible applications of this synthetic approach to contribute to the comprehension of the deeper principles that governs biological and cognitive systems.

Organizational invariants of self-organized criticality
The development of mechanistic models of criticality in biological systems -based on finding the particular properties and generative mechanism that give support this regime of behaviour -typically assumes a particular way of modelling.Typically, we can find models of criticality exploiting clever local rules generating critical behaviour (e.g.Bak et al., 1987;Bak and Sneppen, 1993), parameter tuning of systems showing critical phase transitions (e.g.Beggs and Plenz, 2003;Kitzbichler et al., 2009) and/or the use of specific structural properties of the underlying network of the system (e.g.Rubinov et al., 2011;Haimovici et al., 2013).However, as we have pointed out above, families of critical phenomena can be classified into universality classes determined only by a few properties of the system.One of these properties is that all the critical exponents of all models within a given universality class are exactly the same.For example, in all Ising models in 2D lattices (square, triangular, hexagonal and so forth) spin-spin correlations follow the asymptotic form c(r) ∝ 1/r η , where η = 1/4 (Salinas, 2001).This surprising property provides a perspective about criticality in terms of universal relations, suggesting that we could model criticality using simple and non-specific mechanisms independently on the topology of the system.Following this intuition, we propose a model not committed to a particular local behavioural rule or connectivity topologies.Instead, our model is built on the maintenance of an organizational structure for reproducing the global functional properties of a family of critical models.There exists some experimental evidence showing that, given a Ising model near a critical point, one could build a family of models by learning correlations drawn at random from the original model, which will be poised near a similar critical point (Tkacik et al., 2009).Based on this inspiration, we propose to reproduce and support criticality by the invariant maintenance of an organizational structure of correlations following a 1/r η law.Thus, instead of restricting a model to a particular set of mechanisms, we find a general organizational distribution which could be easily implemented by very different structures, driving a system to a regime of criticality.
We define our model as a neural network as an Ising model or Boltzmann Machine (Ackley et al., 1985) following a maximum entropy distribution: where the distribution follows a en exponential family s) , Z is a normalization value, where the energy E(s) of each state is defined in terms of the bias h i and couplings J ij between pairs of units, and β = 1/(T k B ), being k B Boltzmann's constant and T the temperature of the system.Units s i can take discrete values of +1 or −1 and the couplings and bias can take continuous values.Without loss of generality we can set an operating working temperature such that β = 1.To simulate the network, we use Glauber dynamics, by which the value of a unit is inverted each time with probability: where ∆E i (s) = 2(h i s i + j J ij s i s j ) is the energy difference between the inverted and original state.Throughout the paper, we simulate the network updating its state sequentially by applying Glauber dynamics to all units in the network in a random order at each simulation step.
In order to induce criticality in its behaviour, our model maintains a specific internal structure of correlations, conserving the statistical properties of a system that we know it is at criticality.Since the size of our models will be far from the thermodynamic limit, instead of directly using the asymptotic form c(r) ∝ 1/r η , we approximate it by computing the correlation structure of a known model showing this distribution in the thermodynamic limit.One of the few specific cases where the Ising model presents an exact solution is a model with zero fields and 2D lattice connectivity, in which a critical point appears at sager, 1944).Specifically, we use a 20x20 2D lattice Ising model at critical temperature with periodic boundary conditions.We simulate the model using Glauber Dynamics, generating 10 6 samples, after initializing the model running 10 5 updates.From this simulation, we obtain the distribution of correlations in the system c ij = s i s j observed in Figure 1.A.Since the fields at all units are zero, the means m i = s i of all units are also zero.
Once obtained a distribution of correlations, we will generate new models by assigning to each unit and pair of units means and correlations randomly drawn at random from the distribution of the 20x20 Ising model.At this point, the problem is that it is not trivial finding which combination of h i and J ij generates a specific combination of m j and c ij .This is known as the 'inverse Ising problem'.This problem was formulated by Ackley et al. (1985) in their discussion of Boltzmann machines and can be solved by a simple gradient descent rule: where µ is a constant learning rate, m i and c ij are the objective mean and correlations of the learning algorithm, and m m i and c m ij are the mean and correlations of the model for the current values of h i and J ij .Computing each learn step is computationally costly, since it requires to sum over all possible states of s, although approximate methods as Monte Carlo sampling are generally used to speed up learning.As a demonstration, we apply the learning rule to different models assigning them objective means and correlations drawn at random from the distribution found for the 20x20 lattice Ising model.For each model, we apply Equation 3 for learning m i and c ij , computing the actual m m i and c m ij with Glauber dynamics.Since precision of learning is not important (the objective is to capture the overall distribution) we do not wait for convergence of the algorithm and simply update the learning rule 1000 times.We used a learning rate µ = 0.01 and computed 1000N samples for each learning step, being N the size of the system.
Using this method, we apply the learning rule to 10 different models for sizes N = 4, 8, 16, 32, 64.For each model, we test if the models are at criticality by computing their heat capacity the energy of the Ising model.The divergence of the heat capacity is an indicator of criticality.We simulate each model for 10 5 steps for different values of β, and we found that all the 10 models diverge at the operating temperature of β = 1 (Figure 1.B), showing similar values of heat capacity to the original lattice Ising model with periodic boundaries.Nevertheless, although the distribution of correlations is similar than the lattice Ising model, the structure of the model is radically changed.Instead of the original ordered structure of a uniform lattice (Figure 1.C), we have now a disordered distribution of couplings J ij (Figure 1.D), including both positive and negative couplings.Also, each random selection of correlations yields a completely different arrangement of values of couplings J ij .
In the following section, we use this learning rule to drive the neural controller of an embodied agent towards a critical point.In order to do so, we need to take into account the environment during learning.If we consider two interconnected Ising models, (one being the neural controller and other being the environment) Equation 3 holds perfectly if we only apply it to the values of i and j corresponding to units of the neural controller.In our case, we will not use an Ising model as an environment but instead we will use two classical examples from reinforced learning.Therefore, our learning rule will be valid as long as the statistics of the environment can be approximated by an Ising model with an arbitrary number of units.Luckily, Ising models are universal approximators (Montúfar, 2014) and any arbitrary environment could be approximated by Ising model.

Embodied model: Mountain Car and double pendulum
In order to evaluate the behaviour of the proposed learning model, we test it in two embodied situations using the Ope-nAI Gym toolkit (Brockman et al., 2016).For doing so we define a neural network consisting on an Ising model defined as in Equation 1, describing a network of 8 units, with 6 hidden units and 2 motor units, and a variable number of sensor units (4 or 6, depending on the environment).Motor units will define the actions performed by the agents, while sensor units are updated directly from the state of the environment.Sensor units and motor units are only connected to hidden neurons, while hidden neurons are fully connected to the rest of the system (Figure 2.A).All connections are assigned an objective c ij value (selected at random from distribution P (c ij ) at Figure 1.A), and all units except sensor units are assigned an objective mean value of m i = 0.During learning, the agent will apply the rule in Equation 3 for adjusting its means and correlations to the objective values.At each simulation step, the Ising model is simulated by updating its units in uniform random order using Glauber dynamics.The first embodiment of the network consists in the Mountain Car environment (Moore, 1990).This environment is a classical testbed in reinforcement learning depicting an under-powered car that must drive up a steep hill (Figure 2.B).Since gravity is stronger than the car's engine, the vehicle must learn to leverage potential energy by driving to the opposite hill before the car is able to make it to the goal at the top of the rightmost hill.In this environment, the horizontal position x of the car is limited to an interval of [−1.5π, 0.5π], and the vertical position of the car is defined as y = sin(3x).The velocity in the horizontal axis is updated each time step as v(t + 1) = v(t) + 0.001a − 0.0025 cos(3x), where a is the action of the motor which can be either −1, 0, 1.The sensors of the neural network are defined as an array of four units, which are fed the instantaneous velocity of the car, discretized into an array of 4 bits.Each sensor unit is assigned a value of 1 if its corresponding bit is active and −1 otherwise.
The second embodiment consists in a double pendulum or 'Acrobot' (Sutton, 1996) which has to coordinate the movements of two connected links to lift its weight (Figure 2.C).The position of the system is defined by the angles of both pendulums θ 1 and θ 2 , whose behaviour is defined by the following system of differential equations: where τ is the torque applied to the system which can be either −1, 0, 1, m 1 = m 2 = m is the mass of the links, l 1 = l 2 = 1 is the length of the links and l c1 = l c2 = 0.5 are the lengths to the center of mass of the links, and I 1 = I 2 = 1 are the moments of inertia of the links and g = 9.8 is the gravity.
In order to make the tasks challenging, we set the maximum velocity allowed to the Mountain car to ±0.045 (typically is set to 0.07) and the mass of the Acrobot links to m = 1.75 (typically m = 1).These designed to make it difficult for agents controlled by neural networks with random parameters solve the task (reaching the top of the environment), having success rates of 11.2% for the Mountain Car and 7.7% for the Acrobot1 .
We train 10 agents for each embodiment applying the learning rule from Equation 3, with η = 0.01.In order to avoid overfitting, we add an L2 regularization term λ = 0.004.Note that agents during learning have no explicit goal other than adjusting the correlations of the system to a random sample extracted from the probability distribution in Figure 1.A. Agents are initialized in the starting random positions in the bottom of their environments (x ∈ [0.4,0.6] for the Mountain Car and θ 1 , θ 2 , θ1 , θ2 ∈ [−0.1, 01] for the Acrobot).The state of the neural network is randomized and the initial parameters h i and J ij are set to zero.Agents are simulated for 1000 trials of 5000 steps, computing each trial the values of m m i and c m ij and applying Equation 3 at the end of the trial .Note that agents are not reset at the end of the trial.

Results
In this section, we analyze the behaviour of the neural controllers and the behavioural patterns of the agents respect to the possibilities of their parameter space.The first striking result is that all 10 agents present quite similar behaviour for each embodiment, despite the fact that each one has learned different values of c ij and J ij .In order to compare the agents with other behavioural possibilities, we explore the parameter space by changing the parameter β of the agents.Modifying the value of β is equivalent to a global rescaling of the parameters of the agent transforming h i ← β • h i and J ij ← β • J ij , thus exploring the parameter space along one specific direction.For 21 values of β logarithmically distributed in the interval [10 −1 , 10 1 ] we simulate the 10 agents for each embodiment during 10 6 simulation steps, after starting the agents from the random starting position and an initial run of 10 4 simulation steps.We will use the results of those simulations for all the situations in this section.

Signatures of criticality in the neural controller
First, in order to test whether the agents are being driven near a critical point, we analyze signatures of critical behaviour in the neural controller of the agent.Counting the occurrence of each possible state of all the neurons of the agents (including sensor, hidden and motor neurons) we can compute the probability distribution of the Ising model P (s).We observe that all agents approximately follow a Zipf's law at β = 1 for the Mountain Car (Figure 3.A) and Acrobot embodiments (Figure 3.B), with error bars in a very narrow range.The power-law distribution of neural activation patterns suggests that the neural controller of the agents is operating near a critical point.
Since the neural network is now connected to an environment that is driven by deterministic mechanics, the probability distribution of the Ising neural controller is no longer described by equation 1.Thus, classical indicators of criticality as the divergence of the heat capacity are not directly calculable from the energy of the model.However, we can look for other indicators related to second order phase transitions, computed from the probability distribution P (s) calculated from simulation.One indicator can be the behaviour of order parameters such as the entropy of the system, a transition around the critical temperature.For example, if we compute the entropy of the probability function of the neural controller H(s) = − x P (x) log P (x) for different values of β we observe that the agent is near an orderdisorder transition (Figure 3.C,D) (as well, some Mountain Car agents present a smaller transition at larger values of β, showing that interesting behaviours can also arise in the ordered phase).
In lattice Ising models, critical points are also indicated by other measures such as a peak in the mutual information found in the system (Barnett et al., 2013).Mutual information is defined as: x,y P (x, y) log P (x, y) P (x)P (y) In our case, we measure the mutual information between the set of sensor units s s and the set containing the rest of the neurons of the controller s n .The objective is to capture how much information of the sensorimotor coordination of the robot is captured by the system as a whole instead of being contained in the variables alone.In Figure 3.E,F we can observe how mutual information between sensors and neurons has a peak very close to β = 1, suggesting that agents are poised in a point of the parameter space maximizing the exchange of information between the agent and its environment.These results suggest that the agent's neural controller is operating near a point of criticality, resembling a second order phase transition.As we will see now, no only the agent's neural controller presents indicators of critical activity, but also the behaviour of the agent as a whole.

Behavioural transitions in the parameter space
What does it imply for the agent to poise its neural controller at a critical point?It should be remarked here that our agents are given no explicit goal but they only tend to behavioural We observe a good agreement between the model and Zipf's law, suggesting critical scaling.(C-D) Entropy of the neural system H(s).A transition is observed near the operating temperature.(E-F) Mutual information between sensors and neurons I(s s ; s n ).In all figures the mean of all agents is shown as a solid line, while the region between maximum and minimum values showed by the different agents is shown as a grey area.
patterns maintaining a distribution of correlations randomly sampled from the distribution in Figure 1.A.Thus, we explore the effects of transiting the critical point of the neural controller observing the different behavioural modes of the agent in the parameter space by changing the value of β.The behaviour of the car can be described just by the position x and speed v at different moments of time.As well, the position of the tip of the Acrobot's links shows a good image of the system's behaviour.
In Figure 4.A-C we can observe the behaviour of the Mountain Car for β = {0.5, 1, 2} respectively.We observe that for values of β lower than the operating temperature, the agents are not able to reach the top of the mountain.On the other hand, when β is higher, the agents present more 'rigid' trajectories going form one top of the mountain to the other.embodiments.We observe that β = 1 is a transition point between two modes of behaviour in both agents.
At β = 1 the agent is able to reach the top of the mountain (note that the peaks of the mountain are located at x = −π/2 and x = π/6) while displaying larger behavioural diversity.Similarly, in Figure 4.D-F, we observe that the Acrobot at β = 1 displays a diverse range of behaviours, being able to reach to the top of the plane while, when β is lowered or increased, it drifts to other behavioural modes in less diverse regimes.Furthermore, if we compute the median value of height ỹ for both agents at different values of β (Figures 5), we observe that there is a transition in the parameter space around β = 1, in which both agents maximize the diversity of positions reached in their environment.This suggests that there is a behavioural transition connected with the phase transition of the neural Ising controller, in which the agent maximizes the dynamic range of inputs of the neural controller.

Discussion
Recapitulating the main ideas presented so far, we have tested how taking a set of correlations chosen at random from a distribution generated by a lattice Ising model at a critical point, we can construct a new model that is also near a critical point of its parameter space.Moreover, imposing an embodied agent to maintain these correlations using a simple learning rule -as a sort of organizational homeostasis -drives the agent to a critical point, which coincides with behavioural transitions of its parameter space.This suggests the possibility that criticality could be diagnosed and even induced directly from the maintenance of a given distribution of correlations rather than modelling a precise mechanistic structure.Also, criticality could be caused by quite simple mechanisms only relying on local information, maintaining specific correlations around a given value.Here we have implemented the mechanism as a simple Boltzmann Learning process, but other rules could have the same effect, as the combination of Hebbian and anti-Hebbian tendencies in specific ratios.
In our model, we only require the system to maintain a distribution of relations between the components of the system.This connects with systemic approaches to biology interested not in specific or intrinsic components of biological systems but in the networks of relations and processes (Bernard and Greene, 1957;Rosen, 1972;Ashby, 1962), and it is also in line with notions relational invariance as Piapproach on functional invariants in cognitive development (Piaget, 1971) or Maturana and Varela's ideas of autopoietic machines, defined as homeostatic systems that maintain constant its own organization as a network of relations between components (Maturana and Varela, 1980).
Assuming a similar systemic perspective, we have derived learning rules for a system that drives itself near a critical point by maintaining an invariant structure of correlations roughly defined by a critical exponent 1/r η .This promotes a different perspective on criticality.In our model, the distribution of correlations is not the consequence of criticality in a specific topology, but the cause driving an indeterminate topology to a critical point.The question now could be whether imposing connections derived from a 1/r η function is a strong assumption or implies particularly exigent circumstances.We do not think so, since power law functions can be naturally generated by simple rules of preferential attachment favouring 'rich-get-richer' cumulative inequalities (Greenwood and Yule, 1920), or directly as a natural consequence of certain geometries of space (as e.g.gravitation laws, see Barrow, 2002).
In this way, our model only assumes that a system is going to adapt to preserve an internal network of relations.It emphasizes the maintenance of organizational structures capable of reproducing living systems behaviors being in opposition with the ones relying on internal models of the external source of sensory input.Thus, this contrasts with other approaches which have focused in understanding criticality as a strategy to effectively represent a complex and variable external world, for example discovering criticality in predictive coding or deep learning architectures dealing with complex inputs (Friston et al., 2012;Lin and Tegmark, 2016;Hidalgo et al., 2014).In those cases, an internalist view is assumed, where the neural controller is representing structures of the external world, whose complexity may be the cause of criticality being present in the neural controller.Instead, our approach is agnostic about what are the inputs or the external world of an organism, and deals only with how an agent rearranges its internal structures facing different environments.
The agents presented here are not specifically designed for a particular problem.In simple terms, our agents generate (preserving the same internal neural organization) a wide variability and richness of behaviours (avoiding both disorder and explosive and indiscriminate propagation) that permits to explore the space of parameters and eventually to achieve solutions of problems for which it was not designed.The empirical evidence of experiments shown here supports this idea.A parallel could be established with the concept of play, which can be understood as a 'rule-breaker' activity of the constraints of stable and self-equilibrating regime of behaviours, that is not directly required from the environment and without concrete goals (Di Paolo et al., 2010).A model as the one presented here could be used for exploring lifelike autonomous behaviour without the need of explicit internal representations, goals, or rule-based behaviour.Conceptual models of critical activity based in the maintenance of a system's relational invariants could help developing a synthetic route towards the exploration of adaptive and embodied criticality.

Figure 1 :
Figure 1: (A) Distribution of correlation values used for learning, extracted from a 20x20 lattice Ising model at critical temperature.(B) Divergence of the heat capacity in 10 models learning random correlations sampled from Figure 1.A. Maximum and minimum values are shown by the grey area.(C) Distribution of the coupling matrix of a 8x8 lattice Ising model with periodic boundaries.(D) Distribution of a 64 units Ising model learning correlations sampled from Figure 1.A.The order of the nodes in the coupling matrix has been rearranged using hierarchical clustering.
Figure 2: (A) Structure of the embodied neural controller.(B) Mountain Car environment: an under-powered car that must drive up a steep hill by balancing itself to gain momentum.(C) Acrobot environment: an agent has to balance a double pendulum to reach the high part of the environment.

Figure 3 :
Figure 3: Signatures of criticality for 10 different agents in the Mountain Car (left) and Acrobot (right) embodiments.(A-B)Ranked probability distribution function of the Ising models.The real distribution is compared with a distribution following Zipf's law, (i.e.P (s) = 1/rank, dashed line).We observe a good agreement between the model and Zipf's law, suggesting critical scaling.(C-D) Entropy of the neural system H(s).A transition is observed near the operating temperature.(E-F) Mutual information between sensors and neurons I(s s ; s n ).In all figures the mean of all agents is shown as a solid line, while the region between maximum and minimum values showed by the different agents is shown as a grey area.

Figure 4 :
Figure 4: Transition in the behavioural regime of the agents.We show the behaviour of two agents with different values of β for the Mountain Car (A, B, C) and Acrobot (D, E, F) embodiments.We observe that β = 1 is a transition point between two modes of behaviour in both agents.

Figure 5 :
Figure 5: Behavioural transitions in the (A) Mountain Car and (B) Acrobot environments, showing the median height ỹ of the agent, together with the 5% and 95% percentiles in the Mountain Car and the 5% and the 95% and 99.9% percentiles in the Acrobot (dotted lines).