User Tools

Site Tools


Upload failed. Maybe wrong permissions?
adrl:education:completed_projects:befa2016s

<latex>{\fontsize{16pt}\selectfont \textbf{Learning Rigid Body Dynamics for}} </latex>
<latex>{\fontsize{16pt}\selectfont \textbf{Optimal Control}} </latex>

<latex>{\fontsize{12pt}\selectfont \textbf{Luciano Beffa}} </latex>
<latex>{\fontsize{10pt}\selectfont \textit{Semester Project, RSC}} </latex>

<latex> {\fontsize{12pt}\selectfont \textbf{Abstract} </latex>

Optimal robot motion planning and control methods are based on a model of the robot's rigid body dynamics. In analytical system modelling methods a mathematical description of the robot is found while typically relying on numerous assumptions such as frictionless mechanics, neglect of aerodynamic forces and torques, or disregard of additional mechanical couplings.
In this project a framework is developed that allows to incorporate nearly any model error by augmenting the analytical model with an error term that is learned from experimental data by locally-weighted projection regression, a nonlinear regression tool. Based on this improved model two optimal control approaches are designed and verified in simulation: LQR and trajectory optimization using direct transcription.
The proposed control designs are shown to outperform nominal designs in terms of optimality, required feedback control action, and tracking performance, proving that the proposed approach is suitable for this class of control problems.

<latex> {\fontsize{12pt}\selectfont \textbf{Learning Model Error in Forward Dynamics} </latex>

Traditional optimal control strategies for robot motion planning are based on an analytical model of the underlying rigid body dynamics (RBD). However, with increasing complexity of the system, potential sources of uncertainty, and modelling errors increase, and their effects accumulate. In order to obtain a model that is more accurately representing the real system, we propose to learn these errors in the forward dynamics from experimental data by approximating the real rigid body dynamics as

\begin{equation*} \dot{x} = f(x,u) + \phi(x,u) = \bar{f}(x,u), \label{eq:real_mdl} \end{equation*}

where $f(x,u)$ are the analytical dynamics, derived from first-principles, and the model error function $\phi(x,u)$ is found by formulating a nonlinear regression problem with input $t$, and output $y$ as follows:

\begin{equation*} \begin{split} t &= [x,u]
y &= \phi(x,u) = \dot{x}-f(x,u). \end{split} \end{equation*}

The figure on the right summarizes the idea: Model errors are assumed to be additive, and state, and control-dependent.

There are different learning algorithms that serve as nonlinear function approximators. In this work locally-weighted projection regression1) (LWPR) is used, since it is able to handle high dimensional input spaces, and irrelevant inputs at a relatively low computational effort. In LWPR, the function approximation is achieved by a weighted sum of local linear models, found by performing regression within a lower-dimensional subspace of the full input space.

The algorithm is presented the $N$ training data samples $\lbrace t_i,y_i \rbrace_{i=1}^{N}$ taken from experiments on the real robot.

Finally, the improved model is used in existing optimal control approaches as described below.

<latex> {\fontsize{12pt}\selectfont \textbf{Linear-Quadratic Regulator Design} </latex>

One way to stabilize a system about a certain equilibrium configuration $\bar{x}_0,\bar{u}_0$, is to formulate the infinite-horizon linear-quadratic regulator (LQR) problem, i.e. find a state-feedback control policy that minimizes

\begin{equation*} J = \int_0^\infty x^T Q x + u^T R u \ dt. \end{equation*}

It can be shown2) that the optimal solution is a constant state-feedback control law

\begin{equation*} u = -R^{-1} B^T P = -Kx, \end{equation*}

depending on the system matrices $A$ and $B$, and the cost matrices $Q$ and $R$ through the contiuous-time algebraic Riccati Equation

\begin{equation*} 0 = PA + A^T P - P B R^{-1} B^T P + Q. \end{equation*}

Given the improved model of the RBD, the system matrices can be found as

\begin{equation*} \begin{split} A &= \left. \frac{\partial \bar{f}(x,u)}{\partial x} \right|_{\substack{x=\bar{x}_0
u=\bar{u}_0}} = \left. \frac{\partial f(x,u)}{\partial x} \right|_{\substack{x=\bar{x}_0
u=\bar{u}_0}} + \left. \frac{\partial \phi(x,u)}{\partial x} \right|_{\substack{x=\bar{x}_0
u=\bar{u}_0}}
B &= \left. \frac{\partial \bar{f}(x,u)}{\partial u} \right|_{\substack{x=\bar{x}_0
u=\bar{u}_0}} = \left. \frac{\partial f(x,u)}{\partial u} \right|_{\substack{x=\bar{x}_0
u=\bar{u}_0}} + \left. \frac{\partial \phi(x,u)}{\partial u} \right|_{\substack{x=\bar{x}_0
u=\bar{u}_0}}. \end{split} \end{equation*}

The system matrices $A$ and $B$ thus turn out to be the sum of the analytical system matrices, and the jacobians of the learned nonlinear model error, which are straight-forward to use in the standard LQR design.

The equilibrium configuration $\bar{x}_0,\bar{u}_0$ is found by solving the equation

\begin{equation*} 0 = f(\bar{x}_0,\bar{u}_0) + \phi(\bar{x}_0,\bar{u}_0). \end{equation*}

<latex> {\fontsize{12pt}\selectfont \textbf{Trajectory Optimization} </latex>

The goal of trajectory optimization is to find state, and control trajectories $x^*(t),u^*(t)$ through an optimization problem given by

\begin{equation*} \begin{aligned} & \underset{x,u}{\text{minimize}} & & J(x,u) = \int_{t_0}^{t_f} u^T R u + x^T Q x \ d t
& \text{subject to} & & \dot{\bm{x}} = f(\bm{x},\bm{u})+\phi(\bm{x},\bm{u})
& & & x(t_0) = x_0, x(t_f) = x_f
& & & x \in \mathcal{X}, u \in \mathcal{U}, \end{aligned} \end{equation*}

where $x_0$ and $x_f$ are the initial and final states respectively, and $\mathcal{X}$ and $\mathcal{U}$ are the sets of admissible states and control actions respectively. Note that the improved model can simply be included as a dynamic constraint.

The above problem is then solved as a numerical optimization problem using direct transcription3). The resulting optimal trajectories $x^*(t),u^*(t)$ are part of an open-loop control policy. In order to achieve a stable control strategy, we formulate a time-varying LQR problem by linearizing the improved model $\bar{f}(x,u)$ around the optimal trajectory to obtain a feedback control policy. It can be shown4) that the optimal solution of the trajectory stabilization problem is given by the time-varying state feedback control law

\begin{equation*} u(t) = u^*(t) -K(t) (x(t)-x^*(t)). \end{equation*}

<latex> {\fontsize{12pt}\selectfont \textbf{Quadrotor Simulation Results} </latex>

In order to verify our proposed approach, a quadrotor is used as an example system, and simulated in RotorS5). The simulation is based on a rich model of the quadrotor dynamics including aerodynamic effects, actuation dynamics, and an optional state estimator based on noisy measurements. Furthermore, we increase the system's mass and inertia to mimic a parametric model error. The simulated system is considered the 'real system', and experimental data is drawn from it using a position controller to track different points in the task space.

First, let us compare the performance of infinite-horizon LQR controllers for hovering at $(0,0,1)$. The figure below shows a comparison of a LQR controller based on the analytical model (left) with an LQR controller based on the improved model (right). The golden star indicates the goal state.

The learned model error absorbs the impact of the mass increase, and the resulting LQR controller brings the system closer to the goal state. The figures below show the position of the quadrotor's center of mass during take-off and hover for both controllers using ground truth states (left), and estimated states (right), where the analytical-model-based design is blue, and the improved-model-based design is red.

The RMS error in z-direction can be reduced by 6.4% with ground truth, and 15.4% with state estimation using our approach.

Furthermore, trajectory optimization performance is also compared. We specifiy a go-to task from $x_0 = (0,0,1)$ to $x_f = (5,5,5)$, minimizing control energy, and maneuver agressiveness by choosing $Q$ and $R$ accordingly. Additionally, we choose $\mathcal{U}$ to constrain the control action within the minimum and maximum propeller speeds.

The figures below show the position error (left), and the required feedback control action (right) for the design based on the analytical model (blue), and the one based on the improved model (red).

Clearly, the error between the optimal and the simulated trajectory, and the required feedback action is decreased, with our proposed design. Furthermore, the cost function evaluated along the trajectories is reduced by 18% as shown in the figure to right below: The nominal design (blue) has a significantly higher cost than our design (yellow). The RMS trajectory tracking errors and feedback control effort, and their relative improvements by our design, are listed in the table to the left below.

Finally, the strength of trajectory optimization lies in the fact that nearly any constraint can be included into the framework. To evaluate the performance with our proposed approach, we introduce a cylindrical spatial path constraint

\begin{equation*} (y-y_c)^2 + (z-z_c)^2 \geq (r+\epsilon)^2, \end{equation*}

where $(y_c,z_c)$ is the center of the cylinder, and $(r+\epsilon)$ is the radius of the cylinder including an additional safety distance. The constraint can be included into the trajectory optimization as an inequality constraint. The figures below show the optimal solutions (red), and the simulation trajectories (blue) for the analytical model (left), and the improved model (right) with the cylindrical constraint.

As can be seen clearly, the nominal design results in a trajectory that would collide with the obstacle, even though the optimal solution is consistent with the constraint. By including the learned model error, both the optimal solution, and the simulation trajectory clear the obstacle. Again, the RMS trajectory tracking errors and feedback control effort (left), and the cost function evaluated along the trajectories (right) are compared below.

The RMS errors are significantly reduced, less feedback action is required, and the cost is reduced by 49% with our approach.

<latex> {\fontsize{12pt}\selectfont \textbf{Conclusion} </latex>

In this project we showed that analytical RBD models can be combined with a learning-based nonlinear model error approximation to obtain a hybrid model that is more accurately representing the real system. Furthermore, we showed that such a hybrid model can be included into state-of-the-art optimal control schemes leading to improved performance. LWPR proved to be a suitable choice for the supervised learning of RBD model errors.

Our approach is verified on a quadrotor UAV in simulation, where we can see that hybrid model-based control strategies lead to decreased costs, more accurate trajectory tracking, and less feedback action.

However, in a more detailed analysis we also revealed limitations of this approach. For example, if there are hidden states, they can not be appropriately captured by our additive model error. Also, in contrast to what we expected, there is no guarantee that the control performance will increase over a nominal design in unexplored regions of the input space. Interested readers are refered to the full report of this project.

<latex> {\fontsize{12pt}\selectfont \textbf{Future Work} </latex>

This project introduces a very general framework, and there are many oppurtunities for future work including:

  • Verification of the results on a real quadrotor system
  • Verification of the approach on different, perhaps more complex robots
  • Development of safety guarantees to address issues with generalization
  • Replacement and comparison of LWPR with other machine learning tools for nonlinear regression
  • Use of online learning
  • Implementation in other model-based control strategies
  • Implementation in optimal estimation
1)
Sethu Vijayakumar and Stefan Schaal. Locally weighted projection regression: An O(n) algorithm for incremental real time learning in high dimensional space. In International conference on machine learning, proceedings of the sixteenth conference, 2000.
2) , 4)
Russ Tedrake. Underactuated Robotics: Algorithms for Walking, Running, Swimming, Flying, and Manipulation (Course Notes for MIT 6.832). Downloaded on 12-Aug-2016 from http://underactuated.mit.edu/
3)
Diego Pardo, Lukas Möller, Michael Neunert, Alexander W Winkler, and Jonas Buchli. Evaluating direct transcription and nonlinear optimization methods for robot motion planning. IEEE Robotics and Automation Letters, 1(2):946-953, 2016.
5)
Fadri Furrer, Michael Burri, Markus Achtelik, and Roland Siegwart. Rotors - A modular gazebo MAV simulator framework. In Robot Operating System (ROS): The Complete Reference, volume 1, page Chapter 23. Springer, 2016.
adrl/education/completed_projects/befa2016s.txt · Last modified: 2016/08/12 12:56 (external edit)