Picture of me

William Dorrell

Serious Things

Research | CV | Resources | Gatsby TN Notes

Not-so-serious Things

Learning minimally hierarchical control

Less Technical

When you reach to press a button most of the hard work of control is swept under the carpet. Rather than having to think 'contract left anterior deltoid, relax right bicep etc. etc.' you just think 'press the button.' In fact, even the process of reaching requires the careful control of tens of muscle but it seems effortless. This level of effortlessness far exceeds the current artificial equivalents. We tried to copy some of this effortless motor control in a very simple setup, using a controller that is able to apply (simulated) torques to a (simulated) pendulum.

Our controller had two levels, like two managers. The higher level manager set goals for the lower level manager to fufill, for example by saying 'get the pendulum to 90 degrees', but it didn't worry about how to get there. The lower level manager took the goal set by its boss and exerted torques in order to reach it. So the higher level manager is a bit like you thinking 'press the button', while the lower level manager is like your subconscious that performs the difficult task of choosing which muscles to contract to reach its goal.

We were able to adapt modern learning algorithms to this two leveled task, and we hope that future extensions of this scheme will prove useful in both improved artificial control as well as shedding light on how your body achieves the fine motor skills it is capable of - which could perhaps help in medical cases where patients lose motor control.

More Technical

As part of a broader piece of work I created an easily trained actor-critic architecture for bio-plausible two-level control of a pendulum. The bottom level consists of feedback controllers that when given a target angle exert torques proportional to the difference between current and target position. The higher level consists of a network that sets goal angles. An added difficulty of working with a periodic variable (the angle of the pendulum) is that, depending on the position of 0 degrees round the circle - i.e. the co-ordinate choice, the feedback controller may not behave optimally. For example, if the goal angle is 1 degree, and the current angle is 359 a simple feedback controller might exert a exert a large negative rather than a small positive torque. To sidestep this we included two feedback controllers with 0 points 180 degrees away from one another. It therefore became the task of the higher level manager to choose not only the desired angle, but also which feedback controller to use.

Our setup used bio-plausible reinforcement learning to learn both the goal angle (fourth column in the image below), and which controller to use in each region (third column). Hence, we show a level of hierarchical control in a non-monotonic setup, a possible starting point for future explorations of hierarchical reinforcement learning.

Read the paper here.

This line of work has since been extended by Sergio, the post-doc I was lucky to work with, into a full-blown model of arm-reaching with neurons here!

Okinawa Research Picture

Figure 1: On the right, the actor-critic architecture that chose an angle to track to. On the left, a comparison of initial and final control. At the beginning the controller struggles: the red line does not track the blue, whereas by the end it is tracking closely. The three diagrams show the choices made by the controller, in this case an optimal policy that tracks from current to desired angle.