Emergence of SenseMaking Behavior by the Stimulus Avoidance Principle : Experiments on a Robot Behavior Controlled by Cultured Neuronal Cells

Robot experiments using real cultured neuronal cells as controllers are a way to explore the idea of embodied cognition. Real cultured neuronal cells have innate plasticity, and a sensorimotor coupling is expected to develop a neural circuit. Previous studies have suggested that a dissociated neuronal culture has two properties: i) modifiability of connection between neurons by external stimuli and ii) stability of the connection without external stimuli. If cultured neuronal cells are embodied by coupling to an environment, they learn to avoid external stimulation. We call this mechanism a “learning by stimulation avoidance” principle. We try to demonstrate that adaptive behavior, like wall avoidance, can emerge spontaneously from embodied cultured neuronal cells. In this study, we developed a system in which a robot moves in a real environment and is controlled by cultured neuronal cells growing on a glass plate. We used a high-density complementary metal-oxide-semiconductor array to monitor the neural dynamics. We then conducted a robotic experiment using this platform. The results showed that wall-avoidance behavior by a robot can be enhanced spontaneously without giving any reward from the external environment.


Introduction
Learning is a remarkable phenomenon in the neural system, and it is crucial for animals as embodied neural systems to learn adaptive behavior autonomously to survive.A key concept in studying adaptive behavior is homeostasis.Ashby argued that an adaptive behavior is just an outcome of a homeostatic property of a living system (Ashby (1960)).Di Paolo and Iizuka reported that adaptive behavior is an indispensable outcome of homeostatic neural dynamics (Di Paolo (2000); Iizuka and Di Paolo (2007); Di Paolo and Iizuka (2008)).Yet, those models are still too abstract to be tested in realistic situations.
Biological neural networks cultured in vitro can be used to study potential memory and learning by nervous systems.Using the real biological neural networks is advantageous in that, for example, we can study potential complexity, which may be difficult to implement in artificial neural networks.In this study, we use a dissociated cultured neural system as a model of a biological neural system.Although such cultured neural systems are much simpler than real brain systems, they have some important essential properties, including spontaneous activity, plasticity and rich and complex controllability.Homeostatic control may be one such property.
It has become easier and more popular to study the coupling between cultured neuronal cells and external systems (DeMarse and Dockendorf (2005), Novellino et al. (2007); Pizzi et al. (2009); Warwick (2010)).In previous studies, cultured neurons were connected to an external system, such as a mobile robot in a real space.The sensory information coming from the external system was used to stimulate the neuronal cells, and the resulting neural activities controlled the external system.This change of external system provided feedback to the neural cell states, and this process could be repeated.We call this a "closed loop" and regard it as a model of primitive sensorimotor couplings.By studying such closed-loop systems, we examine a biological memory, learning, or adaptability of a neural system with respect to embodiment.
Studies of closed-loop systems have been documented.For example, Bakkum et al. (2008) trained cultured neuronal cells to achieve a desired behavior with multiple stimulations.Hayashi et al. (2011) proposed another method that used a cultured neural system to incrementally learn to respond in a particular way to a particular input.One drawback of these studies is that they used a conventional microelectrode array as a recording device; this type of device does not have sufficient spatial resolution so that it is difficult to stimulate and accurately detect a single neuronal state.In order to overcome the drawbacks, we use a recently developed device (high-density microelectrode array using complementary metal-oxide semiconductor [CMOS] technology) to detect activities of individual neurons with high precision.The details of this recording device are described in the following sections.The other drawback in closed-loop studies is that an external evaluation function must be prepared and designed properly.A unique feature of our study is the examination of the self-development of such an evaluation function from the closed-loop system itself.We in-Atsushi Masumori, Norihiro Maruyama, Lana Sinapayen, Takeshi Mita, Urs Frey, Douglas Bakkum, Hirokazu Takahashi, Takashi Ikegami (2015)  A principle of our experimental study stems from Shahaf and Marom (2001)'s pioneering work on a cultured neural system.They argued that cultured neuronal cells have the following two characteristics: i) modifiability of connection between neurons by external stimuli and ii) stability of the connection without external stimuli.When a system is coupled to a body, these properties of neural cells lead to intelligent behavior.We rephrase the above properties in the following way: 1 Providing external stimuli to cultured neuronal cells.
2 Connection between each neuron is changed by the external stimuli, and thus a behavior represented through the external body is also changed (modifiability).
3 If a behavior that can stop the external stimuli occurs, stimulation is finished and the current connection is stabilized (Stability).
4 By repeating the above processes, behavior that can avoid the stimulation is improved.
In this way, behavior to avoid a stimulation can emerge spontaneously without having any explicit reward or evaluation function.We call this a "learning by stimulation avoidance" (LSA) principle.LSA assures a homeostatic property as it sustains stability and variation simultaneously.Shahaf and Marom (2001) demonstrated that cultured neuronal cells can actually learn a desired activity pattern in a minimal closed-loop experiment where electrical stimulation is applied to neuronal cells and the stimulation is removed when the network shows a desired activity pattern.In this example, although the experimenters gave a desired neural activity pattern to explicitly remove stimulation, stimulation avoidance led to adaptive behavior.In the present study, we demonstrate that cultured neuronal cells, by coupling with a mobile robot, can learn an adaptive behavior through the LSA principle, without any explicit reward.

Dissociated neuronal culture
The neural cultures were prepared from the cerebral cortex of E18 Wistar rats.The cortex region was trypsinized with 0.25% trypsin, and the dissociated cells were plated and cultured on a recording device.The surface of the electrodes on the device was coated with 0.05% polyethylenimine and laminin for improving plating efficiency.The cells were cultured in Neurobasal Medium (Life Technologies) containing 10% L-Glutamine (Life Technologies) and 2% B27 supplement (Life Technologies) for the first 24 h.Half of the plating medium was replaced with growth medium (Dulbeccos modified Eagles medium (Life Technologies) containing 10% horse serum, 0.5 mM GlutaMAX (Life Technologies), and 1 mM sodium pyruvate) after the first 24 h.The cultures were placed in an incubator at 37 • C with an H 2 O-saturated atmosphere consisting of 95% air and 5% CO 2 .During cell culturing, half of the medium was replaced once after several days with the growth medium.

High-density micro electrode array
A high-density CMOS electrode array (Frey et al. (2010)) was used for measuring the extracellular electrophysiological activity of the cultured neurons (Figure 1).This CMOS array is superior to the conventional multielectrode array (MEA) used previously (Potter and DeMarse (2001); Eytan and Marom (2006); Madhavan et al. (2007)) in that it has far higher spatio-temporal resolution.The number of electrodes in conventional MEAs is small, usually about 64, and the locations of the recording electrodes are predetermined with an inter-electrode distance of about 200 m; thus, it is difficult to identify signals from an individual cell.In contrast, the CMOS arrays have 11,011 electrodes.The diameter of the electrode is 7 µm with an inter-electrode distance of 18 µm over an area of 1.8 mm 1.8 mm.It can record electrical activity on 126 electrodes at one time at a sampling rate of 20 kHz.

Processing before and in recording neural activity
Before recording the neural activities, we scanned almost all the 11,011 electrodes on the CMOS array to obtain an electrical activity map for estimating the locations of the neuronal somata (i.e., identifying the positions of neural cells).In each of the 95 recording sessions, the electrical activities were recorded for 60 s with about 110 electrodes at the same time.An electrical activity map was obtained by averaging the height of the action potentials for each electrode.We applied a Gaussian filter to the map and assumed that the neuronal somata were located near the local peaks in the Gaussian-filtered map.About 120 of the higher level peaks were selected as the positions of neural cells, and the nearest electrodes to the peaks were selected for recording that neural activity.If the number of local peaks were fewer than 126, then all the peaks were used.By using the above method, one electrode can ideally represent a single neural state.A type of neuronal cell is classified as excitatory or inhibitory, which is estimated by using the spike time series recorded for 10 min before the main experiment.Because the shapes of the action potential of these two neural types differ, we classified the type of neuronal cell by using k-means clustering.
For detecting and recording the spike of cultured neurons, we used the MEABench software developed by (Wagenaar et al. (2005)).All recordings were performed at a 20-kHz sampling rate using the real-time spike detection algorithm LimAda in MEABench.As this LimAda algorithm detects a spike that exceeds the threshold without distinction of positive and negative value, unexpected double detection of spikes can occur.These unexpected double-detected spikes were removed from the data before analyzing the data.By sending the electrical stimuli to a neuronal cell through the electrodes, such artifacts might occur.In a robotic experiment, we need to detect the action potential and stimulate the cultured neuronal cell at the same time.The Salpa filter in MEABench was used to remove the artifact in real time (Wagenaar and Potter (2002)).

A colsed-loop system
We implemented a closed-loop system between the cultured neural cells and a robot.This system mainly consisted of three components: a recording system monitoring the cultured neurons, a mobile robot as an external body of the cultured neurons, and the interface connecting them.A current system setup is depicted in Figure 2.
We used the CMOS array and the MEABench software for recording and stimulating neural cells.Elisa-3 (GCtronic, Ticino, Switzerland) was used as a mobile robot.Elisa-3 is a circular small robot of 2.5 cm radius and has two independently controllable wheels.The front right-and left-distance sensors were used as sensory signals for stimulating the neuronal cells.The refresh rate of the robot was set at 10 fps.The interface plays a role in receiving a sensor value from the robot and stimulating the neuronal cells based on the sensor value thorough the CMOS array.The interface also plays a role in receiving detected spike data from the CMOS array in real time and calculating a wheel speed based on the spike data and sending it to the robot.In this way, the robot and the neuronal cells form a closed loop.More details of the sensorimotor mapping are described in the following section.

Sensorimotor mapping
A simple sensorimotor mapping was applied to the robot and neuronal cells on the CMOS array (Figure 3).We selected two electrodes that were estimated as excitatory neurons as the left-and right-input neurons for sending the electrical stimuli.At given time intervals (100 ms), the probability P L,R for sending an electrical stimulation to the input neuron was controlled by the sensory value of the mobile robot.More practically, the probability is calculated as follows: If sensor value S L,R is less than a threshold T , P L,R becomes zero.Otherwise P L,R is calculated by S L,R /S max .S max represents a maximum value of the sensor input.Whether an electrical stimulation is sent to the input neuron or not is determined with this probability every 100 ms.
We also selected 20 electrodes, 10 of which were leftoutput neurons and the other 10 were right ones; all 20 were within the vicinity of each input neuron for calculating each left-and right-wheel speed.The wheel speeds were calculated based on the number of spikes of the output neurons that were integrated every 100 ms.We calculated the leftand right-wheel speeds V l,r as follows: Atsushi Masumori, Norihiro Maruyama, Lana Sinapayen, Takeshi Mita, Urs Frey, Douglas Bakkum, Hirokazu Takahashi, Takashi Ikegami (2015) These virtual neural states v i take positive integers, which are equal to the number of spikes of the output neurons over a given time interval, and sum them with the fixed weight ω i .Finally, a positive constant C as a default wheel speed is added.N l and N r are set of channel number of left-and right-output neurons.Here, as ω i is a negative value and C is a positive value, the robot moves forward when the output neurons are not active.As the activities of the output neurons increase, the speed of the forward movement decreases and finally the robot moves backwards.As the two wheels of the robot are independent, the robot can also turn.
Figure 3: Sensorimotor mapping between a robot and cultured neuronal cells.Two electrodes on the CMOS array are selected as input neurons and connected to the distance sensor of the mobile robot.Twenty electrodes are selected as output neurons and connected to the left-and right-wheel speed of the mobile robot.

Results
A robot was placed in the 60 cm 60 cm arena (Figure 4), and both the neural activities and the behavior of the robot (1 h) were recorded using cultured neuronal cells under two different conditions (Chip#1[DIV 28] and Chip#2[DIV 38]), where Chip#1 is the neural assembly of 28 days and Chip#2 is 28 days after sowing.In previous research, we studied the difference in neural behavior stemming from the different conditions (Matsuda et al. (2013)).The neural spiking patterns were recorded in the pre-(1 h) and post-(1 h) duration of the coupling experiment between the robot and the neural cells.A video recording was used for tracking the trajectories, and we used the open-source software SwisTrack (Correll et al. (2006)) for tracking the trajectories.We also recorded the right-and left-sensor input values of the robot.
In the following section, we show the results of whether wall-avoidance behavior is improved autonomously by analyzing the trajectory and sensor value of the robot.We then provide the analysis of neural connectivity for supporting the improvement of wall-avoidance behavior.

Evaluation of wall-avoidance behavior
We focused on whether the mobile robot could improve wall-avoidance behavior autonomously.Figure 5 shows a trajectory of the robot in the experiment with Chip#1.Qualitatively it appears that the activity pattern changed, which we took as a sign of the modifiability of the networks.Although a quantitative evaluation of the behavior is needed, it is difficult to track the direction of the robot from the video data.Thus, we used wall-collision time for evaluating the wall-avoidance behavior.This is defined as the duration between the time in which a sensory input value exceeds the upper threshold and the time in which the sensory input value is lower than the lower threshold (Figure 6).When the robot collides with a wall or stands close to it, the sensor becomes activated, otherwise it receives a weaker Atsushi Masumori, Norihiro Maruyama, Lana Sinapayen, Takeshi Mita, Urs Frey, Douglas Bakkum, Hirokazu Takahashi, Takashi Ikegami (2015)  signal.Therefore if the estimated wall-collision time become lower, we can conclude that the wall-avoidance behavior was enhanced.
Figure 6: Definition of the estimated wall-collision time.The wall-collision time is defined as the duration between the time in which a sensor-input value exceeds an upper threshold and the time in which a sensor-input value is lower than the lower threshold.
Figure 7 shows the time series of the estimated wallcollision time.Here, relaxation time means the duration for stabilizing the number of stimulation-induced spikes of input neurons.A stimulation-induced spike is defined as the number of spikes of input neurons within 100 ms after each stimulation.Figure 8 shows the time series of the number of stimulation-induced spikes, which stabilized at nearly 1,300 s.Thus, in this case, relaxation time was set to 1300 s.Based on the time series of the estimated wall-collision time, the wall-avoidance behavior was not totally improved, yet the estimated wall-collision time of at least either the right or left sensor gradually decreased.In Figure 7(a), the estimated wall-collision time of the right-sensor input gradually decreased, and in Figure 7(b), that of the left-sensor input gradually decreased, indicating a partial improvement in the wall-avoidance behavior.We can therefore conclude that the LSA principle is working in this setup.

Analyzing dynamics of functional connectivity
We also analyzed the change in functional connectivity.A correlation between a pair of neurons may not represent a physical connection but a functional connection.Several methods are used for estimating the strength of a functional connection between neurons, including mutual information and transfer entropy (Schreiber (2000), Matsuda et al. (2013)).In this study, we used conditional firing probability (CFP) for detecting functional connectivity by using the cross-correlation between neural states (le Feber et al., 2007).This is defined as follows: X i,j are binary arrays of firing in which 0 represents no firing at electrode i; j and 1 represent at least one firing at electrode i; j within the given time window.Thus, CFP i,j (τ ) represents the firing rate at electrode j at a delay time τ (0 < τ ≤ 500ms) after the firing at electrode i divided by the total number of firings at electrode i.The interval of τ is 1 ms. Figure 9 shows an example of a CFP curve fitted by the following equation using the nonlinear least squares method to minimize the mean squared error.
M i,j represents the maximum value above the offset, and T i,j represents the time at which the CFP f it function reaches the maximum value.The shape of the curve is determined by the parameter ω i,j .The offset i,j reflects unrelated background activity.In this study, if M i,j is two times greater than the offset i,j level and T i,j is larger than zero and does not exceed 250 ms, T i,j is regarded as a functional connection between electrode i; j and M i,j is regarded as the estimated strength of functionally connected electrodes i; j.Using this CFP method, we calculated the strength of functional connectivities between the electrodes and compared the pre-experiment and the post-experiment values to evaluate the changes in functional connectivity.We also compared it with the result of open loop experiment.Here open loop experiment means the experiment in which the stimulation is sent to the input neuron at the same timing with that of robot experiment, but the cultured neuronal cells is not connected to the robot, then there is no feedback to the stimulation pattern from neural activity.The open loop experiments was conducted using same chip with the closedloop experiments.Results showed that the mean strength of the functional connection in the post-experiment was significantly greater than that of the pre-experiment (p < 0.05, Wilcoxon rank-sum test) in a closed-loop experiment but that there was no significant difference in the results of the open loop experiment where (Figure 10).The results of the closed-loop experiment showed that the functional connectivity between the input neuron and the output neuron increased as did the synchronization between the two neuronal states.An easier way to improve wall-avoidance behavior in the experimental setup is to have many output neurons fire at approximately the same time Atsushi Masumori, Norihiro Maruyama, Lana Sinapayen, Takeshi Mita, Urs Frey, Douglas Bakkum, Hirokazu Takahashi, Takashi Ikegami (2015)  after the stimuli.If this happens, we can conclude that wallavoidance behavior by the LSA principle has improved (see also Sinapayen et al. (2015), for a discussion of LSA with artificial neural network experiments)

Discussion
We usually train a robot to maximize a reward function either by on-line or off-line learning.Without any reward function, it is difficult to self-organize sense making or a goal-oriented behavior.We scarcely have such autonomous robots; the present study challenges this problem.
Hanczyc and Ikegami (2010) demonstrated a simple but spontaneous sense-making behavior with a simple chemistry.Water at a high pH reacting with oleic anhydride generates self-moving droplets, which maintain the reaction on its surface, sustaining its self-mobility (Toyota et al. (2009); Hanczyc and Ikegami (2010)).As a result, a droplet climbs up the pH gradient when the environmental pH is 12 and turns away from it when pH is 10.This pseudo chemotaxis of oil droplets was easily introduced by an internal chemical reaction plus embodiment.This behavior pattern feeds back into a chemical reaction of a droplet to sustain its activity.
We aimed to demonstrate that a tendency to escape from a stimulus generates autonomous motion and in the end leads to a goal-oriented behavior.We call this LSA.We aimed to evaluate this learning hypothesis with a simple robot experiment.In this study, we implemented a closed-loop system for connecting a robot and cultured neuronal cells and conducted the experiment using this system.The results showed that wall-avoidance behavior is partially improved, and we consider that such sense-making behavior can be enhanced autonomously even though there are no explicit external rewards.We argue that this emerging sense-making behavior is an outcome of integrating embodiment, the environment, and adequate sensors.In this paper we focused on wall-avoidance behavior; however, we hypothesize that other sense-making behaviors can emerge from other couplings of embodiment and environment.
Previously, Di Paolo (2000), Iizuka and Di Paolo (2007) and Di Paolo and Iizuka (2008) studied the emergence of sense-making behavior as an outcome of neural homeostasis.Ikegami (2013) developed an experimental art system (called "Mind Time Machine" [MTM]) consisting of three screens and 15 video cameras.In MTM, each camera iteratively projects images taken from those screens over 3 months.Those in-take images are edited and modified by the underlying artificial neural dynamics.The complexity of the environment and the stored image memories of MTM create a life-like impression for the people who interact with MTM.MTM created a sense-making behavior without any reward function.
The emergence of sense-making behavior is and has been a major theme for artificial life and should be explored further.A drawback of the robot-neural platform is that it uses the entire network for making one style of behavior.This can be improved by i) promoting module networks for learning and ii) restoring and retrieving many memories.However, we primarily need more studies to show how LSA works practically for evolving sense-making behavior.Furthermore, we hypothesize such an LSA principle can be generalized to not only cultured neural networks but also to an artificial neural network and to any system with other elements that include modifiability and stability, for example, a microorganism that has no nervous system.

Figure 1 :
Figure 1: The high-density CMOS electrode array used in this experiment.This recording device has 11,011 recording sites, a diameter of 7 µm, and an inter-electrode distance of 18 µm.

380 Figure 2 :
Figure 2: Overview of the closed-loop system composed of the high-density CMOS electrode array monitoring the culture of neuronal cells, a mobile robot, and the interface connecting them.

Figure 4 :
Figure 4: Experiment environment.The robot is placed in the square arena (60cm60cm).

Figure 5 :
Figure 5: Trajectory of a robot in the Chip#1 experiment.The left panel shows the trajectory of the first 15 min in the robot experiment; the right panel shows the trajectory of the last 15 min.

Figure 7 :
Figure 7: Estimated wall-collision time.(a) Results of Chip#1 and lower figures.(b) Results of Chip#2.Each left figure shows the time series of the wall-collision time estimated from the left-sensor input of the robot.Each right figure shows the time series of the wall-collision time estimated from the right-sensor input of the robot.The blue dotted line represents the relaxation time for stabilizing the simulation-induced spike of the input neuron.

Figure 8 :
Figure8: Time series of the number of stimulation-induced spikes of the input neuron in Chip#1.A stimulation-induced spike is defined as the number of spikes of input neurons within 100 ms after each stimulation.In this case, the stimulation-induced spike stabilized at around 1300 s, indicating the total relaxation time.

Figure 9 :
Figure 9: An example of the conditional firing probability (CFP).A gray line represents the CFP curve.A red line represents the fitted curve to the CFP curve.M i,j represents the maximum value above the offset, T i,j represents the time at which the CFP f it function reaches the maximum value, and offset represents the unrelated background activity.

Figure 10 :
Figure 10: Comparison of the estimated strength of a functional connection.Upper figures are the results of Chip#1, and lower figures show the results of Chip#2.Each left figure is a closed-loop experiment (robot experiment), and each right figure is an open loop experiment.In the closed-loop experiment, the estimated strength of the functional connection in the post-experiment is significantly greater than that of the pre-experiment (*p < 0.05, Wilcoxon rank-sum test), but there is no significant difference in the results of the open loop experiment.
Emergence of Sense-Making Behavior by the Stimulus Avoidance Principle: Experiments on a Robot Behavior Controlled by Cultured Neuronal Cells.. Proceedings of the European Conference on Artificial Life 2015, pp.373-380 cluded this feature because we believe that self-development of an evaluation function is how adaptive behavior emerges spontaneously with most animals.
Emergence of Sense-Making Behavior by the Stimulus Avoidance Principle: Experiments on a Robot Behavior Controlled by Cultured Neuronal Cells.. Proceedings of the European Conference on Artificial Life 2015, pp.373-380 Emergence of Sense-Making Behavior by the Stimulus Avoidance Principle: Experiments on a Robot Behavior Controlled by Cultured Neuronal Cells.. Proceedings of the European Conference on Artificial Life 2015, pp.373-380 Emergence of Sense-Making Behavior by the Stimulus Avoidance Principle: Experiments on a Robot Behavior Controlled by Cultured Neuronal Cells.. Proceedings of the European Conference on Artificial Life 2015, pp.373-380