Embodiment dictates learnability in neural controllers

Catastrophic forgetting continues to severely restrict the learnability of controllers suitable for multiple task environments. Efforts to combat catastrophic forgetting reported in the literature to date have focused on how control systems can be updated more rapidly, hastening their adjustment from good initial settings to new environments, or more circumspectly, suppressing their ability to overfit to any one environment. When using robots, the environment includes the robot's own body, its shape and material properties, and how its actuators and sensors are distributed along its mechanical structure. Here we demonstrate for the first time how one such design decision (sensor placement) can alter the landscape of the loss function itself, either expanding or shrinking the weight manifolds containing suitable controllers for each individual task, thus increasing or decreasing their probability of overlap across tasks, and thus reducing or inducing the potential for catastrophic forgetting.


I. INTRODUCTION
It has been shown in various single-task settings how an appropriate robot design can simplify the control problem [18,27,4,2,17,22], but because these robots were restricted to a single training environment, they did not suffer catastrophic forgetting.
Catastrophic forgetting is a major and unsolved challenge in the machine learning literature [9,11,15,20]. Regardless of learning algorithm or task domain, a neural network trained to perform task A and then challenged with learning task B as well usually forgets A at the same rate as it learns B. Such interference can also occur when an agent attempts to learn tasks A and B simultaneously if gradients of improvement in A lead away from those of B.
In a multitask setting, Powers et al. [23] recently demonstrated that certain body plans suffer catastrophic forgetting, while others do not. It was hypothesized that a robot with the right morphology could in some cases alias separate tasks: certain designs are able to move in such a way that a seemingly different training instance converges sensorially to a familiar instance. However, this conjecture was not isolated and tested. Likewise, the relationship between the body and the loss landscape was not investigated.
In this paper, we provide a more thorough investigation on the role of embodiment in catastrophic forgetting based on the assumption that in order to avoid catastrophic forgetting, there must exist a set of control parameters that are adequately performant across multiple task environments simultaneously. Since a robot's mechanical design can influence the set of controller parameters suitable for each individual task environment, we here test the hypothesis that a specific physical property of the robot's design-namely, the location of its sensors along its body-can help or hinder continual learning by allowing for more or less overlap in suitable parameter settings across multiple task environments.
Using a simple yet embodied agent as our model, we analytically and empirically investigate how sensor location affects the weight manifolds of the neural controller over multiple tasks. We show how morphological optimization often results in asymmetrical and unintuitive sensor arrangements with much more potential to allow learning algorithms to avoid catastrophic forgetting than more intuitive, symmetrical designs. Thus, human designer bias, while often useful, can sometimes inadvertently increase the likelihood of catastrophic forgetting during learning. This suggests that we should scrutinize our prior assumptions about the body plan of robots challenged with continual learning, and where possible replace them with end-to-end data-driven design automation.

II. METHODS
A. The robot.
The robot has a square frame, two separately-driven wheels, and two infrared sensors (Fig. 1). The sensors detect light according to the inverse square law, i.e., 1/d 2 , where d is the distance from the light source; occlusion was not modeled. The motors driving the wheels are contralaterally connected to the sensors by weighted synapses yielding two trainable parameters w 1 , w 2 ∈ [−1.0, 1.0].
We here consider change to a single, isolated morphological attribute: the physical location of the two sensors, which can be placed anywhere on the dorsal surface of the robot's square body. The location of the i-th sensor i can be described by its Cartesian coordinates i = (x, y), where x, y ∈ [−0.5, 0.5], and (0, 0) denotes the center of the body (Fig. 1B).
The effect of sensor location i can be measured with respect to the space, denoted θ, of possible synapse weight pairs (w 1 , w 2 ). Since we cannot perform an exhaustive sweep over the infinitude of possible sensor positions, we discretized each dimension of i into nine uniformly-spaced bins. Because sensors are varied in two dimensions (x and y) there are 9 2 = 81 possible locations for each sensor; and because there are two such sensors, the space θ is discretized into a 81-by-81 uniformly-spaced grid, thus yielding a searchable space of 6561 possible robot designs.
For each of the 6561 designs, we conducted another sweep over the synapse weights (w 1 , w 2 ), likewise discretizing each weight into 121 evenly-space values, yielding 121 2 = 14641 arXiv:1910.07487v2 [cs.LG] 6 May 2021 The effect of lateral and contralateral synaptic connections (adopted from [3]). B: The theoretical model with sensor positions determined by 1 and 2 . C: The simulated robot with two light sensors (red), two motorized wheels (black), and a passive, anterior castor wheel for balance (gray). The robot is drawn (A-C) with symmetrical, anteriormost sensor placement, which we refer to in this paper as the "canonical design". different weight configurations. Finally, for each of the 6561× 14641 = 96059601 evaluated combinations of sensor locations and weight values, we analyzed the robot analytically using differential equations and empirically using a physics engine. These discretizations were chosen to be as small as possible within the limit of our computational resources and time.
B. The task environments.
The task is phototaxis in four environments, which differ in their position of the light source in relation to the robot. The light source is placed at polar coordinates (r, ϕ) where ϕ ∈ {45 • , 135 • , 225 • , 315 • } and r is a fixed distance. A controller was considered successful for a given environment if the robot comes within 0.2 cm of the light source at any time during its evaluation period.
While there is of course a general strategy that solves the task for all environments (follow the light), the easiest gradients to follow in the loss landscape are initially those which produce forward locomotion in a single direction and cause the robot to ignore the light. This is because, from the robot's perspective, due to the inverse square law of light decay, improving its ability to move in the one environment with least loss earns quadratically more reward than improvements to locomotion in any of the other three environments in which the robot is less proficient. This causes the catastrophic forgetting experienced by neural learning algorithms.
C. The metrics.
We here define two metrics: M L and M CF , that are measured over k = 4 environments. These metrics measure how a robot design impacts the weight space of the controller and consequently measure how amenable to learning a robot would have been if the controller were to be learned with a standard learning algorithm rather than found by grid search. M L measures controller learnability: how easy it would be to learn a generalist controller. M CF measures resistance to catastrophic forgetting: the probability that a environmentspecific controller will generalize to other environment.
For each mechanical design ( 1 , 2 ), we expect some optimal manifolds θ * k in the space of control parameters (w 1 , w 2 ) to succeed for a specific environment k. For a controller to be successful in multiple environments, it must reside within the intersection of environment-specific manifolds, theta * , on the loss surface. Thus, the likelihood of finding a generalist controller-its learnability-will be proportional to the size of the intersection (M L ). Likewise, a controller's potential to resist catastrophic forgetting (M CF ) will be proportional to the ratio of generalist controllers (those successful in all four environments) to specialists (those successful in at least one environment).
Given a design ( 1 , 2 ) and environment k, a binary suc- . By overlapping the success matrices for a fixed design across the four environments, we can visualize the manifolds θ * k where k ∈ {1, 2, 3, 4} for the robot (Fig. 2).
We define the overlap O as a element-wise sum of the success matrices over each environment k: The learnability metric is simply the proportion of 4s (where a 4 represents success in all 4 environments) in the overlapped success matrices to the entire matrix space: where g k is a function that counts the total elements of a matrix with value equal to k and n is the square dimension of the matrix defined by the discrete parameter sweep. Resistance to catastrophic forgetting is measured by: which is the number of control parameters that solved all four environments divided by the number of control parameters that solved at least one.
D. The theoretical model.
The location and orientation of the robot can be defined by a system of differential equations, where the change in position and orientation is determined by the change in light captured by two sensors. Ignoring deviations from the idealized environment, such as sensor noise and friction, the rate of angular and linear velocities will be proportional to a linear combination of the sensor values.
Let α(t) be the angle of the robot at time t, where α = 0 denotes the positive x direction, and φ(t) = (x(t), y(t)) be the position of the robot in the world, then if the robot is located at the origin and facing east (α = 0), its two light sensors are located exactly at 1 and 2 , and they each capture some amount of light s 1 (t) and s 2 (t), respectively.
Hence the absolute position of the i-th sensor is is the two-dimensional counterclockwise rotation matrix (in the amount α).
If we formulate the problem such that it is the robot's initial position and heading that is adjusted in each environment, instead of the position of the light source, we can assume that the source is always at the origin. Then, the distance of i from the light source is given by: . And since the intensity of light is inversely proportional to the square of the distance, the sensor values are given by: where c is a constant that we set equal to one.
Assuming the robot turns based on the difference between the sensor values (with weights applied), the velocity of the robot is the average of the two sensor values. Thus, the following system of equations determines the location and orientation of the robot: where v is the velocity of the robot given by 2v(t) = w 1 s 1 (t)+ w 2 s 2 (t).
E. The empirical model. Because our theoretical model is highly abstracted from the real world and built on a number of assumptions (no friction, motor limits, collisions, etc.) which may potentially affect the robot's behavior, we also empirically test our claims by simulating the robots inside a physics engine.
The robot is simulated using Open Dynamics Engine (Fig. 1C). Just like the theoretical model, the simulated robot contains two light sensors, which innervate two motorized, spherical wheels (each with a single axis of rotation), which are attached midway along the sides of a 1 × 1 × 0.13 cm box. Additionally, an anterior passive castor wheel was added for balance. Finally, a light source is simulated on the floor of the environment at polar coordinates (r, α) as a fixed sphere with radius 0.2 cm. In simulation, the behavior of a robot in a given environment is taken to be successful if it collides with the In order to replicate the baseline behavior of the canonical robot design it was necessary to pre-optimize various physical attributes of the robot's body, including the mass of each component, the radii of the wheels, and the maximum torque, speed, and target actuation rate. A multiobjective optimization algorithm [12] was used to find a base morphology, with the sensors fixed in the canonical position, that was both performant and stable. The first objective was to maximize the performance of the robot (distance from the light source), summed across all the four environments. The second objective was to minimize the sum of the maximum torque, speed and target actuation rate. This second objective is used to avoid both simulator instability and behavior that is unlikely to transfer to reality.
After discovering a good base morphology, we performed the nested grid search described in §II-A, for sensor locations ( 1 , 2 ) and weights (w 1 , w 2 ).
For each evaluated mechanical design and controller (sensor locations and synapse weights), the robot's trajectory is computed in each of the four environments defined in §II-B. As in the empirical model, if robot's trajectory comes within 0.2 units of the light source, the robot is determined to have succeeded in that environment. Otherwise, it is determined to have failed. . Under the empirical model, the design with the highest controller learnability was also the most resistant to catastrophic forgetting (C). Although the design space we swept over contains many symmetrical sensor arrangements, and most real robots utilize symmetrical sensor distributions, the best designs are notably asymmetrical.
The mechanical design sketched in Fig. 3A (and its mirror image when reflected about the sagittal plane) had the highest controller learnability score, with M L = 0.286. However it did score the best in resistant to catastrophic forgetting: the proportion of resistant to nonresistant controllers for that design was M CF = 0.636, whereas several other found designs had full resistance M CF = 1. But those with a perfect ratio M CF = 1 had much smaller optimal weight manifold: the highest learnability score achieved by this group was M L = 0.206. In other words, while all the successful environment-specific controllers for these designs generalize across all four environments, the manifold containing them is much smaller and thus would be more difficult to find if controller parameters were to be optimized by learning.
The canonical design had a much lower controller learnability (M L = 0.049) and resistance to catastrophic forgetting (M CF = 0.24), than many found asymmetrical designs.
For both the canonical, symmetrical design (Fig. 4) and the design with the highest controller learnability score (Fig. 5) there are initial conditions that generate persistent phototaxis: the robot moves toward the light source and remains near it. However, whereas 35 of the 35 found phototaxing controllers for the found design remain in the neighborhood of the source, only 2 of the 6 found controllers for the canonical design do so. Some initial conditions of the canonical design initially produce phototaxis, but the design passes through the source and then continues to move away from it (Fig. 4A). This was not observed to occur with the "optimized" designs.

B. Empirical results.
As with the theoretical model the empirical model showed that non-intuitive asymmetrical designs scored higher in learnability and in resistance to catastrophic forgetting. However unlike the theoretical model one design performed the best on both metrics.
The found asymmetrical design shown in Fig. 3C had both the highest generalist controller learnability (M L = 0.0039) and resistance to catastrophic forgetting (M CF = 0.038). Overall, there were 57 generalist phototaxing controllers found (out of 14641 evaluated; 0.389%) for this design, compared to only one generalist phototaxing controller found (0.0068%) for the canonical, symmetrical design. The controller learnability of the canonical design was thus M L = 0.000068; and its resistance to catastrophic forgetting was M CF = 0.00052. Thus, the found asymmetrical design has both higher controller learnability and resistance to catastrophic forgetting.

C. Overview.
In Fig. 6 the successes of weight manifolds for all of these design in both the theoretical and empirical model can be seen in detail, where cyan represents weight assignments that succeed in all for environments for a given design. These weight manifolds show clearly that in this case the weight assignments for the asymmetrical would be much easier to find by a learning algorithm while the canonical design is akin to looking for a needle in a haystack. Fig. 7 plots the frequency of metrics M L and M CF (Eqs. 2 and 3, respectively) within each bin of the grid search. This again shows how there are many designs (including intuitive symmetric ones) that score poorly on M L and M CF while there are relatively few designs that perform well. Thus a given design has a drastic effect on the theoretical learnability of a robots controller parameters.
IV. DISCUSSION In this paper, we considered a simple robot and task in order to sample the entire loss landscape of the weight manifold at a relatively high resolution. While we haven't tested these robot with any specific learning algorithm, our results suggest that changes in one element of a robot's design (sensor location) can fundamentally alter the loss surface, thus influencing the controller's learnability, and resistance to catastrophic forgetting. More specifically, by changing sensor location, we observed changes in the number and placement along the loss surface of control parameters suitable for individual environments, as well in how these optimal yet environmentspecific parameters overlapped across different environments to produce generalist controllers which resist catastrophic forgetting. However, we acknowledge that this work mainly builds a theoretical foundation and that our metrics need to be tested against existing methods for learning.
Previous efforts to avoid catastrophic forgetting have relied almost exclusively on increased control complexity. Most were focused on making changes to small subsets of neural network weights [15,20,8,24,14,1,25,26]. Others have attempted to sidestep the problem by learning good initial weights such that they can be quickly updated when switching between tasks [7,10]. We have shown here that, in theory, regardless of the algorithm used it is also possible to alleviate catastrophic forgetting by changing aspects of the robot's design, without increasing control complexity, but doing so can be nonintuitive. We found that even the seemingly trivial case of phototaxis with contralateral connections described by Braitenberg [3] can require morphological tuning to work as expected in a single simulated environment, and that, when challenged to perform in additional environments, other adjustments in morphology, specifically to sensor location, could either suppress or multiply the potential for catastrophic forgetting by expanding or shrinking the overlap of performant controller settings for that body plan across different task environments.
The physical location of sensors is thus a relevant property of robots that is nevertheless abstracted away in the (mostly disembodied) systems that address catastrophic forgetting reported in the literature to date. While sensor location could in principle be dynamically controlled via a lattice of sensors [16] or adjustable antenna [6], change in (and rational control over) other morphological attributes-such as geometry [17], material properties [21], or the number and placement of actuators [19]-is much more difficult in practice, and such design elements are almost always presupposed and fixed prior to training [5].
Light Source Trajectory Start Trajectory End However, unless experimental proof is obtained in the real world, this theory will remain speculation. In fact it is possible that the proposed empirical model using rigid body physics was more disconnected from reality than our theoretical model. The simulated wheels, for instance, have just a single point of contact with the ground. A more realistic surface contact geometry might completely change the optimal sensor locations, but there's also reason to believe that the loss surface manifolds containing adequate controllers for a compliant body could be larger than those of a rigid body [17,13], further increasing the probability of overlap across tasks.
In the limit, machines with the right morphology may use a single controller to accomplish a set of tasks that appear disparate to a robot with a different body plan. For example, a granular jamming gripper [4] need not precisely control the placement of each joint around differently shaped objects: a single policy (vacuum air, hold, relax) works regardless of object shape. However, this control policy is exceedingly simple. The degree to which morphology influences learnability in more complex robots, task environments and behaviors has yet to be investigated, but will be the focus of future work.
In this work, two control-and two morphology parameters were optimized. In future work we will investigate whether co-optimizing the morphology and control parameters confers greater overall learnability on the robot compared to a robot with a fixed morphology and four control parameters. This will help determine whether a poorly chosen mechanical design can be compensated for by increased control complexity. Under the controller sweep on (D-F), the design with the highest controller learnability also had the greatest resistance to forgetting, so E and are identical. Each pixel represents a different controller (w 1 , w 2 ) for the given design, and is colored by the number of environments that the combination successfully exhibited phototaxis (i.e., the overlapped binary success matrices, defined by Eq. 1). Under both the theoretical and empirical models, the unintuitive asymmetrical designs (B, E) were found to have higher controller learnability and greater resistance to forgetting in their landscape than their respective canonical design (A, D) as measured by the number pixels in the heatmap that are successful in all four environments (cyan). Likewise, the asymmetrical designs (C, F) had higher resistance to catastrophic forgetting as measured by the number of cyan pixels to non-blue pixels.