LAB

<latex>{\fontsize{16pt}\selectfont \textbf{Optimization of Stereotypical Trotting Gait on HyQ}} </latex>

<latex>{\fontsize{12pt}\selectfont \textbf{Brahayam David Ponton Junes}} </latex>
<latex>{\fontsize{10pt}\selectfont \textit{Master Project RSC}} </latex>

<latex> {\fontsize{12pt}\selectfont \textbf{Abstract} </latex>

Over the last decades, locomotion of legged robots has become a very active field of research, because of the versatility that such robots would offer in many applications. With very few exceptions, in general, legged robot experiments are performed in controlled lab environments. One of the reasons of this limited use is that in real world environments, legged robots have to interact with an unknown environment, and in order to do it successfully and safely, they need to be compliant, such as humans and animals are. In the context of this project, a framework for learning an optimized stereotypical trotting gait for the Hydraulic Quadruped robot HyQ using variable impedance is proposed. This is an important step towards closing the gap between robot capabilities and nature’s approach for animal locomotion.

<latex> {\fontsize{12pt}\selectfont \textbf{Hydraulic Quadruped Robot - HyQ} </latex>

HyQ (Picture from IIT)	Characteristics of HyQ
	HyQ is a hydraulically-powered quadruped robot. It was developed at the Italian Institute of Technology - IIT, as a platform to study legged locomotion in highly dynamic motions and careful navigation over rough terrain. It has 12 active joints: torque or position controlled. The joint range of motion of each joint is 120 degrees. It weights 70 kg, and it is 1m tall.

<latex> {\fontsize{12pt}\selectfont \textbf{Reactive Controller Framework - IIT} </latex>

RCF modules

Biref overview of RCF modules (Picture from IIT)

In this project, the base controller over which the parameter optimization will be performed is the Reactive Controller Framework ¹⁾. The Reactive Controller Framework has been designed for robust quadrupedal locomotion. It is composed of two modules. The first one is dedicated to the generation of elliptical trajectories for the feet, whereas the purpose of the second one is the control of stability of the robot.

WCPG Generator	Tracking Controller	Velocity Estimation	Trunk Stabilization
The elliptical trajectories are parametrized by the height $F_{c_{i}}$, length $L_{s}$, a desired forward velocity $V_{f}$, a duty cycle $D$ and a step frequency $\omega_{s}$. Some of these variables are related by the equation: $\omega_{s} = \frac{V_{f}}{L_{s}}D$	The elliptical trajectories generated in Cartesian coordinates are transformed in desired joint space trajectories by an inverse kinematics transformation. The trajectory tracking controller receives the desired joint space trajectories $q_{d}$ $\dot{q}_{d}$ $\ddot{q}_{d}$ and uses an inverse dynamics algorithm to provide feed-forward commands, and a PD position and torque controller to provide feedback commands.	This sub-module is in charge of the estimation of translational velocities. Angular velocities and accelerations can be directly measured by the gyroscopes and inertial measurement unit (IMU). The estimation of body velocities is done by mapping joint velocities of the stance legs, assuming there is no slip or that the friction force constraints the forward movement of the feet in stance phase.	The trunk stabilization block performs gravity compensation and also compensates the deviations of the roll and pitch angles with respect to a horizontal framework by a PD controller.

<latex> {\fontsize{12pt}\selectfont \textbf{Learning and Control Strategy} </latex>

This section provides a description of the design and control architecture implemented for gait optimization using the reinforcement learning algorithm PI2. The learning algorithm is built on top of the physics and control environment SL (Simulation Laboratory), and using the Optimization engine using reinforcement learning algorithms.

A brief picture of the learning and control setup consists in the following key ideas. First, online learning of frequency and phase is performed to synchronize the feedback control policies (Control and Adaptation Layer) ²⁾ ³⁾. Frequency identification is performed by using frequency oscillators and synchronizing them to the roll angle of the robot (a periodic variable of the locomotion gait carrying information about its frequency). This identified frequency and a phase resetting mechanism ⁴⁾ are used for identifying the phase of the system. The phase resetting mechanism performs an event-based correction, using the feet contacts as events. Then, by means of executing and evaluating roll-outs, the feedback control policies for variable impedance control and trunk stabilization are tested and improved (Learning Layer). A roll-out is a single execution of the policy parameters.

Control and Adaptation Layer

Learning Layer

This picture shows an extract of the roll angle of the robot on the first row, On the second one, the reference frequency (used for the generation of elliptical trajectories) and the frequency of the roll angle (identified by Adaptive Frequency Oscillators) are shown. The third row shows the phase estimation using the phase resetting mechanism, that corrects the slightly difference between the frequency of the generated trajectories and the frequency at which the legs touch the ground.

The cost function takes care of learning of two sets of parameters: WCPG and GAIN parameters. It penalizes speed tracking errors, energy efficiency, closeness to its joint limits, high feedforward torques and variance along the pitch-roll trajectories. This last term is important because it guides the learning towards discovering trajectories that tend to a stable limit cycle, giving robustness to the trotting gait.

<latex> {\fontsize{12pt}\selectfont \textbf{Experiments and Results} </latex>

This section will present some of the experiments and results obtained in simulation.

Optimization Example

The cost function was designed to be as simple as possible, but expressive enough to be able to efficiently perform a multi-criterion optimization. As it is a multi-criterion optimization, a trade-off between the different objectives is achieved as result of the optimization (Pareto optimal value). For this reason, the weights were selected so that the different objectives contribute to the total cost with the same order of magnitude. In this way all objectives are optimized.

This figure shows how the robot learns a compliant policy. It reduces the stiffness needed during swing phase, so that when the leg makes a touch down, it can interact compliant enough with the environment. The effect of the compliance given by the policy can be seen in the trajectory tracking performance of one of the joints, as shown in Figure (second plot).

<latex> {\fontsize{12pt}\selectfont \textbf{Conclusions} </latex>

* The algorithm optimizes directly feedback terms by learning variable impedance schedules for the robot-environment interaction, and trunk stabilization parameters. It also learns indirectly feed-forward terms by optimizing the WCPG parameters that generate the desired feet elliptical trajectories. It has been shown that the algorithm has scaled very well to this very high dimensional problem, that optimizes the parameters for the entire locomotion cycle (stance and flight phase).

* The learning algorithm has generated policies for different locomotion speeds, achieving a stable locomotion gait with limit cycle and an energy efficient locomotion frequency.

* The issue of specifying a target impedance is not trivial, therefore learning is necessary. The learning algorithm has learned a variable impedance schedule, that gives the robot the compliance needed for the interaction with the environment. It provides enough stiffness during swing phase and compliance during stance phase, trading off in this way, the leg objectives of high performance trajectory tracking and robustness for the interaction with the environment.

* The algorithm has not been tested in the real robot, therefore, the next step, in order to validate the results obtained in this project, will be to perform learning on the real robot. This will allow to push HyQ to its performance limits, taking into account also not modelled dynamics.

¹⁾

Victor Barasuol, Jonas Buchli, Claudio Semini, Marco Frigerio, Edson De Pieri and Darwin Caldwell, A reactive controller framework for quadrupedal locomotion on challenging terrain

²⁾

Ludovic Righetti, Jonas Buchli and Auke Jan Ijspeert, Dynamic Hebbian learning in adaptive frequency oscillators

³⁾

Ludovic Righetti, Control of legged locomotion using dynamical systems: Design methods and adaptive frequency oscillators.

⁴⁾

Jun Nakanishi, Jun Morimoto, Gen Endo, Gordon Cheng, Stefan Schaal and Mitsuo Kawato, Learning from demonstration and adaptation of biped locomotion.

LAB

User Tools

Site Tools

Sidebar

Page Tools