Towards designing artificial universes for artificial agents under interaction closure

We are interested in designing artificial universes for artifi- cial agents. We view artificial agents as networks of high- level processes on top of of a low-level detailed-description system. We require that the high-level processes have some intrinsic explanatory power and we introduce an extension of informational closure namely interaction closure to capture this. Then we derive a method to design artificial universes in the form of finite Markov chains which exhibit high-level pro- cesses that satisfy the property of interaction closure. We also investigate control or information transfer which we see as an building block for networks representing artificial agents.


Introduction
We are interested in designing artificial physics for artificial agents.This paper presents an exploratory step in this direction and also expounds the conceptual and the formal point of view we are taking.In this introduction we give a short overview of our approach and then proceed to formally define the different elements.
Conceptually, we draw inspiration for our artificial physics and agents from "real" physics and living organisms.The artificial agents we have in mind are are minimally represented by networks of "high-level" or "macroscopic" processes.These high-level processes are derived from the underlying artificial physics.This situation is analogous to viewing living organisms as networks of processes (Maturana and Varela, 1980) on a meso-or macroscopic scale e.g.proteins or cells, and assuming an underlying physics e.g.elementary particle physics.Formally, we model our artificial physics simply as a univariate finite discrete time Markov process.We choose a univariate process because we do not want to presuppose any structure of the state space of the artificial physics.We also assume there is no downward causation (Campbell, 1974).This means that at all times, the high-level processes are causally dependent on the underlying physics.Loosely speaking, this means that the edges (interactions) of the high-level network of processes representing the agent are actually mediated by the low-level process.As we will see, this can formally be modelled using Bayesian networks.
The final ingredient of our general approach tries to account for the success of doing science on scales larger than elementary particles e.g.atomic physics, chemistry and biology.To take this into account, we require that the high-level processes are as predictive of other highlevel processes as the underlying physics itself.In other words, the high-level processes at least appear to be directly causally related.Formally, we achive this by slightly extending the notion of informational closure introduced by Bertschinger et al. (2006) to two notions that we will call weak and strong interaction closure.Requiring informational closure already puts some constraints on the underlying process (Pfante et al., 2014) and so do interaction closures.
Within this general setting we here inspect the situation where one high-level process seems to control another one.The idea is that any high-level network that represents an agent needs such a mechanism.Consider for example a sensor that writes its measurement to another process e.g. a memory for further processing.Another interpretation would be that the controlled process is part of the embodiment of the agent and therefore within the sphere of influence of the agent and shielded from the environment.The latter interpretation is related to the notion of embodiment put forward by Porr and Wörgötter (2005).Yet another, more conservative, interpretation would be that the first process simply transfers information to the second.Information transfer is widely seen as an important part of decentralized computation (Lizier et al., 2014).Which in turn may be just what a network of processes representing an agent needs.Formally, we use an information theoretic notion, the transfer entropy (Schreiber, 2000), to quantify (here only apparent) control.Control and transfer entropy have been linked in another context by Touchette and Lloyd (2004).
Note that the mechanism we treat is a requirement we introduce here in addition to interaction closure property.In order to arrive at a complete agent further mechanisms within larger networks are required.This will be investigated in future work.
The results in this paper show that the requirements of strong interaction closure and control from a pair of highlevel processes put strong constraints on the dynamics of the underlying process.To arrive at these constraints we assume the ideal cases of both interaction closure and control.It should be seen as an advantage of the information theoretic measures we employ that they are both "soft".This means they can readily be used to quantify also the degrees to which closure and control are present in a system.

Related work
In general, artificial agents have been studied using information theoretical concepts by several authors (e.g.Klyubin et al. (2004); Lungarella et al. (2005); Bertschinger et al. (2008); Williams and Beer (2010); Zahedi and Ay (2013)).Of those authors many also employ Bayesian networks and specifically the perception-action loop (Klyubin et al., 2004;Bertschinger et al., 2008;Zahedi et al., 2009)).The perception-action loop is a Bayesian network describing the causal relations between four stochastic processes representing environment, sensor, actuator, and memory (of the agent) states respectively.In these papers the perception-action loop is not seen as a network of high-level processes in our sense since the interactions between the four processes are direct and not mediated by an underlying process.
As already mentioned our notion of interaction closure is an extension of the concept of informational closure introduced by Bertschinger et al. (2006).The main difference is that we define interaction closure between two processes with respect to a third (the underlying one) while the original notion concerns closure of one process with respect to another only.We also use a stronger version of informational closure.
Conditions on underlying processes to exhibit "independence" of a high-level process from an underlying one have been studied for Markov chains at least since Kemeny and Snell (1976).They study lumpability which requires that the high-level process is itself a Markov process.Research in this direction has been extended in Görnerup and Jacobi (2008); Jacobi and Görnerup (2009).Very recently lumpability has been shown to be implied by informational closure by Pfante et al. (2014).In this work various other level structure measures have also been thoroughly investigated.Interactional versions were not studied though.
Our notion of apparent control or information transfer is studied in the context of distributed computation in great detail by Lizier et al. (2014).It is argued there that information transfer (measured in the same way as here) is one of three ingredients needed for computation the other two being information storage and information modification.Investigations into the computational capabilities of dynamical systems have a long history (e.g.Langton (1990); Mitchell et al. (1993) and see Lizier et al. (2014) for more).As far as we know, the focus there has not been on the implications of computation occurring on a high-level for the underlying process.

Artificial universe
We start by representing an isolated system (referred to as an artificial universe or the underlying process in the following) by a finite Markov chain1 {X t } t∈I on state space X defined by the time-homogenous transition kernel (or Markov matrix Our assumption is that the isolated system should be Markov, as there is no external storage of information about past states.Choosing finiteness and time discreteness is done to reduce technical issues and improve clarity of the concepts, for the same reason we restrict ourselves to the stationary case in this treatment.Stationarity may often be a valid approximation for some time interval.

High level processes
We call a random process Note that the transitions π Y xy are independent of time.See Fig. 1 for the corresponding causal Bayesian network2 .We also define the Bayesian inverse: where p(x) is the stationary distribution.For a detailed investigation of high-level processes see the work of Pfante et al. (2014).We also explicitly mention the deterministic case.Call a random process {Y t } t∈I on state space Y a deterministic high-level process of Again transitions are independent of time.The Bayesian inverse reduces to: where

Weak and strong informational closure
Informational closure was introduced by Bertschinger et al. (2006) to formalize the idea of closure known from systems theory (see references ibid.)within the framework of information theory.Loosely speaking, closure is attained by a system if it can be described without reference to the environment that it is part of (Bertschinger et al., 2006).We will distinguish between a weak and a strong form of informational closure.For a high-level process {Y t } t∈I and underlying process {X t } t∈I (Fig. 1) weak informational closure is defined by (see Pfante et al. (2014)): where I(Y ′ : X|Y ) is the conditional mutual information.
The conditional mutual information for three arbitrary random variables X, Y, Z is defined by x,y p(x, y|z) log p(x, y|z) p(x|z)p(y|z) .
(8) Intuitively one can read this as the amount of extra information Y contains about X that is not already in Z.So informational closure (Eq.7) requires that the current highlevel process state Y is as predictive with respect to the next high-level process state Y ′ as the current underlying process state X.Note that this condition can be made stronger by requiring that Y is even as predictive of Y ′ as the next underlying process state X ′ .This is expressed by what we will call strong informational closure:

Interaction closure
We now extend the concept of strong informational closure to two high-level processes.Given two high-level processes {Y t } t∈I and {Z t } t∈I and an underlying process {X t } t∈I , we say that we have strong interaction closure from This implies (see Appendix A) the weak interaction closure: and The idea behind interaction closure is, that the states of one process are as predictive of the other's next states as the states (current or next respectively) of the underlying process.

Apparent control
In order to measure in how far one high-level process {Y t } t∈I appears3 to control another high-level process {Z t } t∈I we use the one-step transfer entropy (Schreiber, 2000).Transfer entropy has been shown to be a measure of controllability by Touchette and Lloyd (2004).
Here we say that {Y t } t∈I appears to control {Z t } t∈I if We could also use the term "information transfer" as in Lizier et al. (2014) to put more emphasis on the relation to computation, but as control was the first thing we had in mind we stick to it in this publication4 .
Note that strong interaction closure does not imply apparent control, e.g.let {Y t } t∈I = {Z t } t∈I then according to the definitions strong interaction closure implies that apparent control is zero.This is due to the fact that apparent control is based on non-causal transfer entropy and therefore a process can never (apparently) control itself.
We also use the definition of perfect apparent control (Touchette and Lloyd, 2004) to express the case where apparent control is maximal.
Perfect apparent control means for all initial states z ∈ Z and all final states z ′ ∈ Z there exists a state y ∈ Y such that Then . the transfer entropy attains its maximum value.

Implications of interaction closure
We now present the implications of strong interaction closure for the underlying process.In order to keep the necessary technical terminology to a minimum we make a few more assumptions which lead to stronger results.
In the following we will denote the process from which the interaction closure "originates" by {S t } t∈I and the "receiving" one by {M t } t∈I .This is done to conform to an interpretation as a sensor that (apparently) writes or transfers information to a memory.In this case strong interaction closure reads: In Appendix B. we show that under strong interaction closure and the two extra assumptions |M| = |S| and {M t } t∈I deterministic i.e.
the following hold (see also Fig. 3): The process {S t } t∈I is also deterministic with respect to {X t } t∈I and we have an associated function f S : X → S.
Moreover, for each for some function for some bijective function g : S → M with g and We have thus arrived at a condition on the transition matrix of the artificial universe process from the requirement of strong interaction closure.There are two main things to take away from this.
The first is how to construct a transition matrix that obeys strong interaction closure.For this choose a finite set X with |X | = n.Then take two sets M and S with |M| = |S| and functions f M : X → M and f S : X → S. Then construct a matrix, split it vertically according to the preimages (f S ) −1 and horizontally according to those of (f M ) −1 (if for example the first and the last row are part of (f M ) −1 (m) make sure to remember they belong to the same block).Make sure that each column sums to one, and note that the entries in each column can only be larger than zero in one block of the preimage of (f M ) −1 .Here is an example with The second is that we have two partitions on the state space X induced by the two functions f M and f M ′ .The former, (f M ) −1 partitions X into blocks of states mapped to the same m ∈ M at the current time step and we call it the current partition.The latter partitions X into blocks that are mapped to the same m ∈ M at the next time step and we call it the future partition.Note that as g is bijective we can also view the future partition as induced by f S = g −1 • f M ′ which shows that s ∈ S indicates the blocks of the future partition at the current time step.Note that time evolution starting in (s, m) would be (s, m), (s ′ , g(s)), (s ′′ , g(s ′ )), ....Here s ′ , s ′′ , ... are determined by the underlying dynamics.
The relation between the two partitions can take two extreme cases.The first is, when they coincide i.e. if for every m ∈ M exists s ∈ S such that (f M ) −1 (m) ⊆ (f S ) −1 (s) and vice versa.The other extreme case is when they are orthogonal i.e. when for every pair m, s ∈ M × S we have For coinciding partitions the blocks coincide and each block has unique associated high-level states s ∈ S and m ∈ M.This means given s for a block, m is determined and vice versa.There is then a bijective function h : S → M which maps the current s to the current m (g maps it to m ′ the next high-level state).We can then write M = h(S) and S = h −1 (M ), the two processes up to changes of the alphabet identical.
For orthogonal partitions, in every block of the current partition there is at least one element of every block in the future partition.This means by only knowing the block of the current partition i.e. m ∈ X does not tell us anything about the current s or the next m ′ = g(s).

Implications of apparent control and strong interaction closure
Here we only look at implications for apparent control under the same assumptions as in the last section.
Recall that apparent control is measured in this context by I(M ′ , S|M ).We then have the current and the future partition of X .We consider the two extreme cases of coinciding partitions and orthogonal partitions.For coinciding partitions, apparent control vanishes.To see this recall that we have a the bijective function h (see last section) such that To see this note that the random variable h −1 (M ) can never contain more information than M itself.
If we look at the orthogonal case we have that for every block of the current partition indicated by m ∈ M and every m ′ ∈ M there is an x ∈ X with f M (x) = m and f S (x) = m and g(s) = m ′ .But this just implies perfect apparent control, as in this case p(m ′ |m, s) = 1. (24) So our measure of apparent control varies from 0 to its maximum H(M ′ |M ) due to the possible relations between the current and future partitions.
We can also ask whether perfect apparent control implies orthogonal partitions.As we need for every m, m ′ ∈ M an s ∈ S with p(m ′ |m, s) = 1. ( we can see that in every block of the current partition corresponding to m there must be elements x in the future partition (i.e.f S (x) = s) that lead to each m ′ .Due to strong interaction closure, and |S| = |M| we have a one-to-one relation between m ′ and s given by g, so there must be elements x corresponding to each s in each block of the current partition.This means the two partitions are orthogonal.

Discussion
We were looking for design principles for artificial universes especially with regard to the capability to contain artificial agents on a higher or macroscopic level.Conceptualizing artificial agents as networks of high-level processes, we focussed on the interaction of two such processes.To formalize the condition that there should be some explanatory power on the macroscopic level we introduced interaction closure as an extension to informational closure.
We found that if we require interaction closure, equal cardinalities of the high-level processes' state spaces and determinism of the receiving process, the dynamics of the underlying process must respect (see Eqs. 20, 21) two partitions of state space 5 .How the two partitions are related is not determined by interaction closure.In other words, interaction closure does not specify the kind of interaction and requires only that it is closed with respect to the underlying process.
To design an underlying process we can then choose the partitions (which induce the two processes) freely and create the transition matrix accordingly (see Results).Considering that we can choose the underlying state space arbitrarily large we expect that a large variety of high-level dynamics can be implemented in this way.
We also investigated a special kind of interaction, apparent control, between the high-level processes.It can be interpreted as one high-level process controlling the other or as one process transferring information to the other.We identified to extreme cases which occur.The first occurs if the two partitions associated with the interaction closure coincide, the two high-level process are essentially the same, and apparent control vanishes.The second occurs when the two partitions are orthogonal, the two high-level processes are complementary, and control is maximal.Intermediate relations between the partitions would led to intermediate levels of control.
In the future we want to investigate complete networks of high-level processes that are informationally and interactionally closed.Further interesting measures are the other ingredients of computation, information storage and modification as well as their localized versions (Lizier et al., 2014).These are interesting to us because computation seems relevant for artificial agents.We also want to focus on network structures relevant for artificial agents with metabolisms.

Figure 1 :
Figure1: Bayesian network representing one time step of the relationship of the underlying process {X t } t∈I and a high-level process {Y t } t∈I .The primed random variables represent the process state one time step after the not primed ones.
2: Bayesian network representing one time step of an underlying process {X t } t∈I and high-level processes {Y t } t∈I and {Z t } t∈I .It follows from the definition of high-level processes that strong informational closure implies weak informational closure (see Appendix A).Note that none of these conditions actually change the causal structure of the Bayesian network.
Bayesian network representing one time step of an underlying process {X t } t∈I and high-level processes {S t } t∈I and {M t } t∈I .We indicate for the case mentioned in the Result section the mechanisms associated with transitions.Dashed arrows are not part of the Bayesian network (not causal).Note δ f S is also associated with X → S and δ f M also with X ′ → M ′ .This is not indicated due to space limitations.We call this apparent control because in our case the random variable Y is part of a high-level process, and does not represent a true controller.The cause of the dynamics of {Z t } t∈I remains {X t } t∈I .