User Tools

Site Tools


Upload failed. Maybe wrong permissions?
adrl:education:completed_projects:mwild2015

<latex>{\fontsize{16pt}\selectfont \textbf{Optimal and Learning Balance Control}} </latex>
<latex>{\fontsize{16pt}\selectfont \textbf{of an Exoskeleton}} </latex>

<latex>{\fontsize{12pt}\selectfont \textbf{[Mathias Wild]}} </latex>
<latex>{\fontsize{10pt}\selectfont \textit{[Master Project RSC]}} </latex>

<latex> {\fontsize{12pt}\selectfont \textbf{Abstract} </latex>

In this project, optimal control is applied to the balance problem of bipedal robots. The goal was to achieve balance including stepping relying only on the optimization algorithm, avoiding state machines or precomputed trajectories. The used algorithm is Model-Predictive Control (MPC) using iterative Linear Quadratic Gaussian Regulator (iLQG). The robot simulation is implemented in the Control Toolbox (CT). Model-Predictive Control (MPC) was implemented in a single node simulation, in which the simulation waits for the algorithm to finish before continuing, and a simulation using three nodes running concurrently, in which the of the MPC algorithm is executed in parallel to the simulation. While balance was achieved for quite signifi cant disturbances, the goal of stepping was missed.

<latex> {\fontsize{12pt}\selectfont \textbf{Optimal Control and Model-Predictive Control} </latex>

The aim of optimal control is to apply forces to a system in order to minimize a given cost function. In this context, cost function is used as a rather generic term that describes any measure that is considered in the optimization, such as magnitude of the forces applied or distance to a desired trajectory. The costs describe the control goals, which may be conflicting. The goal of using as little control force as possible by itself leads to a very di fferent controller than the goal of following a trajectory closely. In the cost function these measures are associated with weights and added up. The optimization then finds the control policy that minimizes the total cost 1). The controller behaviour is largely determined by the cost function, meaning the choice of which measures to associate with what costs. For a stochastic system with a finite time horizon, such a cost function can be described by
<latex>{\begin{equation*} J = E \left[ h (\mathbf{x}(t_f)) + \int_{t_0}^{t_f} \ell (\mathbf{x}(t), \mathbf{u}(t)) dt \right] \end{equation*}</latex>

where $\ell(\mathbf{x}(t), \mathbf{u}(t))$ are the intermediate costs, $h (\mathbf{x(t_f))$ is a fi nal cost, and $t_0$ and $t_f$ are the start and end time respectively, and $E$ is the expected value 2). Note that the total cost $J$ as well as the intermediate costs $\ell$ are represented by a scalar value, despite the fact that the state $\mathbf{x}(t)$ and control signal $\mathbf{u}(t)$ are vectors.

<latex> {\fontsize{8pt}\selectfont \textbf{iLQG} </latex>

To what degree and how easily the optimization problem defined above can be solved also depends on the underlying system, form of the costs, and what assumptions are made about both. The well known Linear Quadratic Gaussian Regulator (LQG) for example assumes linear dynamics, quadratic costs and Gaussian noise. For the iterative Linear Quadratic Gaussian Regulator, there are hardly any assumptions on the underlying system given by the differential equation

<latex>{\begin{equation*} \text{d}\mathbf{x} = f(\mathbf{x}, \mathbf{u}) \text{d}t + \mathbf{C} (t) \text{d}\mathbf{\Omega} \end{equation*}</latex>

where $\mathbf{\Omega}$ is state and control independent Brownian motion noise. Instead of trying to solve the problem directly, the optimal control policy is approximated along the projected trajectory.

In a first step, a trajectory is calculated using a deterministic approximation of the system, $\dot{\mathbf{x}} = \mathbf{f}(\mathbf{x}, \mathbf{u})$ up to a finite time horizon, $t_f$, and storing the nominal states $\bar{\mathbf{x}}$ and control signals $\bar{\mathbf{u}}$ along the way. Second, the time is discretized into $K$ steps, with a time step $\Delta t = \frac{t_f}{K-1}$, along with all time varying quantities, resulting in $\bar{\mathbf{x}}_k$ and $\bar{\mathbf{u}}_k$ for all $K$ steps. Third, the system dynamics is linearized and the cost function quadratized along the nominal trajectory, leading to

<latex>{\begin{equation*} \delta \mathbf{x}_{k+1} = \mathbf{A}_k \delta \mathbf{x}_k + \mathbf{B}_k \delta \mathbf{u}_k \end{equation*}</latex>

and

<latex>{\begin{equation*} J = q_K +\delta \mathbf{x}_K^\intercal \mathbf{q}_K + \frac{1}{2}\delta\mathbf{x}_K^\intercal \mathbf{Q}_K \delta\mathbf{x}_K + \sum_{k=0}^{K-1} \left[ q_k + \delta\mathbf{x}_k^\intercal \mathbf{q}_k + \frac{1}{2}\delta\mathbf{x}_k^\intercal \mathbf{Q}_k \delta\mathbf{x}_k + \delta \mathbf{u}_k^\intercal \mathbf{r}_k + \delta \mathbf{u}_k^\intercal\mathbf{R}_k \delta \mathbf{u}_k\right] \end{equation*}</latex>

where $\delta \mathbf{x}_k$ and $\delta \mathbf{u}_k$ are the deviation from the nominal trajectory. In a fourth step, the optimization problem is solved recursively similar to a LQG problem, calculating the optimal feed-forward signal update ($\mathbf{l}_k$) and feedback controller ($\mathbf{L}_k$) along the way, with the resulting updated nominal control signal.

These steps are repeated until the difference between the nominal control trajectories falls below a threshold. The nominal trajectory and calculated feedback control can be combined with the feed-forward term and the nominal control signal into a overall feed-forward term ($\mathbf{u}_{k,ff}$), so that the system state can be used directly with the feedback term without any further transformation resulting in the control policy $\mu_k$ for each step $k$

<latex>{\begin{equation*} \mu_k = \mathbf{u}_{k,ff} + \mathbf{L}_k \mathbf{x}_k. \end{equation*}</latex>

<latex> {\fontsize{8pt}\selectfont \textbf{MPC} </latex>

Also known as receding-horizon control, the MPC framework retains the advantage of optimal control by generating complex behaviour based on specified high-level cost functions but avoids the curse of dimensionality, the main drawback of dynamic programming. This is achieved by repeatedly solving a finite-horizon optimization control problem with a relatively short time horizon, thus only considering states that are actually visited 3).

In an idealised case, the calculation of a new controller takes no time. When new sensor readings become available, a new controller is calculated for a certain time horizon, $t_{i, h}$. Since in reality it does take time to find a new controller, it was implemented to update the controller at fixed intervals $\tau$, but halting the simulation until it was completed. In other words, as far as the simulation is concerned, the update happens instantaneously. The controller will then be used until the next update. Generally, this update will happen frequently and earlier than the horizon of the controller.

MPC SchematicIdealised MPC Schematic

In a more realistic situation, the computing a new controller takes time. That means, the controller will be based on states that are no longer current. Also, the considered time horizon must be longer than the expected computing time for a new controller. The time required can vary, as indicated in the figure above. To simulate this, the situation needs to continue while a new controller is calculated.

There is a number of possible control strategies that could be applied in an MPC framework. In this project, iLQG was used to compute the controllers. This means that at each controller update, the iLQG algorithm described in the previous section is executed.

<latex> {\fontsize{12pt}\selectfont \textbf{Software Implementation} </latex>

The simulation was implemented using the ADRL Control Toolbox (CT). It is implemented in the form of ROS packages, which is mainly used to handle the communication between different components. In a first step, a model of the exoskeleton was defined using the Robotics Code Generator 4).

 Robot Model

The CT is designed to use the code generated in that fashion. The selected model has 11 joints, incuding most of the joint that will be present in the exoskeleton. To better capture the behaviour of the system with a human attached, a trunk was attached to the hip. This joint will not be present in the exo.

Code Implementation Schematic

The simulation was implemented as one ROS node. This includes a mechanism to execute a disturbance to any link. Most parameters for the controllers as well as the simulation are set in parameter files and not hard coded. This speeds up testing significantly, as parameters can be changed easily and quickly, not requiring to recompile each time. When running the MPC iLQG controller in an ideal szenario, the control update as well as the controller computation are executed on the same node as well. Because of this, the simulation will not continue until a new controller becomes available.

For a more realistic simulation, the control update and controller computation were moved to their individual nodes. The schematic on the right illustrates this setup. Each node runs concurrently. The simulation updates the states and moves forward in time, each time publishing the current robot state. The controller node receives the states and computes the control signal based on the currently active controller, publishing the control signal. It also publishes the robot state again for the controller update in the MPC node. In the current set up, it seems redundant for the controller node to also publish the robot state. However, this is only the case because right now the simulation publishes the robot states directly. Later implementations might want to only publish sensor readings and use the controller node to do state estimation as well, before publishing the robot state to the MPC node. The MPC node executes the iLQG algorithm based on the most recent robot states, using a previously defined cost function. Once the controller is computed, it is published for the controller node to update its controller, before starting a new iLQG run based on the most recent states.

Base Position, ideal case

<latex> {\fontsize{12pt}\selectfont \textbf{Results \& Conclusion} </latex>

The optimal control did not quite reach the desired goals. While the iLQG algorithm by itself, after lengthy parameter tuning, does manage to gain balance, it is not very robust to small changes to the cost function or task. The same can be said for the realistic MPC implementation. While some cost weights were found that did lead to stability, varying delays in the controller computation alone lead to vastly different outcomes. Especially when these delays are long, but not exclusively. The only motion that approaches stepping was achieved by a small hop, triggered by the perturbation, when the ideal MPC controller was active, with control weights that heavily punished ankle torques. The base position trajectory for this case is presented in the above Figure.

Despite the difficulties, the possible gains of optimal balance control are well worth the eff ort. Erez, Lowrey, Tassa, et al. 5) use a similar approach, to much greater success. However, they use a simulator that is specialised to handle the contacts and they employ a state machine to switch between different cost functions, depending on the task at hand and situation, as well as much more computational power. Their work would certainly be a good place to start for any project that will follow, to expand and use the software infrastructure build in this project.

1)
R. F. Stengel, Optimal Control and Estimation, 1994
2)
J. Buchli, F. Farshidian, M. Giftthaler, J. Pfleging, T. Sandy, and A. Winkler, Lecture notes: Optimal and Learning Control for Autonomous Robots, 2015
3)
Y. Tassa, T. Erez, and E. Todorov, “Synthesis and stabilization of complex behaviors through online trajectory optimization”, in 2012 IEEE/RSJ international Conference on Intelligent Robots and Systems, IEEE, 2012, pp. 4906-49613.
5)
T. Erez, K. Lowrey, Y. Tassa, V. Kumar, S. Kolev, and E. Todorov, “An integrated system for real-time model predictive control of humanoid robots”, in 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids), IEEE, 2013, pp. 292-299.
adrl/education/completed_projects/mwild2015.txt · Last modified: 2015/12/21 07:37 (external edit)