GMT:
General Motion Tracking for
Humanoid Whole-Body Control

Zixuan Chen*             Mazeyu Ji*             Xuxin Cheng             Xuanbin Peng            
Xue Bin Peng†             Xiaolong Wang†
* Equal contribution    † Equal advising

Long Skill Sequencing

Uncut version of motion tracking on long skill sequences.

Agile Skills


Kungfu

Airkick


Ready Kick

Kick & Walk


High Kick

Soccer Shoot

Dancing




Stylized Locomotion


Crouch Walk

Punch & Stand


Loco-Manip(Walk & Squat)

Pass


Drunk Walk

Spinning


Squatting

Side Step


Throw

Stretch


Stylized Walk

Warm Up


Stylized Walk

Stomping

Abstract

The ability to track general whole-body motions in the real world is a useful way to build general-purpose humanoid robots. However, achieving this can be challenging due to the temporal and kinematic diversity of the motions, the policy's capability, and the difficulty of coordination of the upper and lower bodies. To address these issues, we propose GMT, a general and scalable motion-tracking framework that trains a single unified policy to enable humanoid robots to track diverse motions in the real world. GMT is built upon two core components: an Adaptive Sampling strategy and a Motion Mixture-of-Experts (MoE) architecture. The Adaptive Sampling automatically balances easy and difficult motions during training. The MoE ensures better specialization of different regions of the motion manifold. We show through extensive experiments in both simulation and the real world the effectiveness of GMT, achieving state-of-the-art performance across a broad spectrum of motions using a unified general policy.

Q&A

Q: What distinguishes GMT from previous works such as HumanPlus, OmniH2O, ExBody2, and ASAP?
A: HumanPlus and OmniH2O focus on loco-manipulation, exhibiting limited lower-body motion. ExBody2 operates on a small-scale manually curated motion dataset and requires fine-tuning for each motion category. ASAP, on the other hand, emphasizes addressing sim-to-real gap in agile motion deployment, but relies on training separate policies for each single motion. While GMT is designed to be a general motion tracking framework, capable of tracking a wide range of motions with a single unified policy with high fidelity.

BibTeX


@article{chen2025gmt,
title={GMT: General Motion Tracking for Humanoid Whole-Body Control},
author={Chen, Zixuan and Ji, Mazeyu and Cheng, Xuxin and Peng, Xuanbin and Peng, Xue Bin and Wang, Xiaolong},
journal={arXiv:2506.14770},
year={2025}
}