Laplacian Representations for Decision-Time Planning
1University of Alberta 2Alberta Machine Intelligence Institute (Amii) 3University of Calgary 4Canada CIFAR AI Chair
Planning with a learned model remains a key challenge in model-based reinforcement learning due to the compounding error problem. In decision-time planning, state representations are critical as they must support local cost computation while preserving long-horizon temporal structure. In this paper, we show that the Laplacian representation provides an effective latent space for planning by capturing state-space distances at multiple time scales. The Laplacian representation preserves meaningful distances and naturally decomposes long-horizon problems into subgoals, thus mitigate the compounding errors that arise over long prediction horizons. Building on these properties, we introduce ALPS, a hierarchical planning algorithm, and demonstrate that it outperforms commonly used baselines on a selection of offline goal-conditioned RL tasks from OGBench, a benchmark previously dominated by model-free methods.
Overview of ALPS: In pre-training, ALPS (1) learns the Laplacian representation using the Augmented Lagrangian Laplacian Objective (ALLO), (2) learns a one-step forward model on top of the original state space, (3) learns the behavior prior πprior using the scaled Laplacian representation, and (4) clusters the dataset using k-means in the scaled Laplacian space to generate the cluster graph. In planning, (5) ALPS takes the current state from the environment and uses the high-level planner to determine the next subgoal, then determines the next action towards this subgoal using the low-level planner.
Properties of Laplacian representation: (Left) The scaled Laplacian representation (ψ) is isometric to the commute-time distance in the data graph, so distances in ψ-space reflect how hard it is to navigate between states. (Right) k-means clustering in ψ-space groups states into topologically connected regions used as subgoal waypoints.
Success rate (%) on each of the locomotion and manipulation tasks from OGBench considered. The results are averaged over 8 seeds (4 seeds for pixel-based tasks), and we report the standard deviation after the ± sign. ALPS outperforms the model-free GCRL algorithms with p < 0.001 using a Holm-Bonferroni-corrected two-sided Wilcoxon signed-rank test. Values in bold denote the largest mean in each row as a visual aid.
| Environment | Dataset Type | Dataset | GCBC | GCIVL | GCIQL | QRL | CRL | HIQL | ALPS |
|---|---|---|---|---|---|---|---|---|---|
| pointmaze | navigate | pointmaze-medium-navigate-v0 | 9±6 | 63±6 | 53±8 | 82±5 | 29±7 | 79±5 | 82±10 |
| pointmaze-large-navigate-v0 | 29±6 | 45±5 | 34±3 | 86±9 | 39±7 | 58±5 | 80±8 | ||
| pointmaze-giant-navigate-v0 | 1±2 | 0±0 | 0±0 | 68±7 | 27±10 | 46±9 | 67±11 | ||
| pointmaze-teleport-navigate-v0 | 25±3 | 45±3 | 24±7 | 4±4 | 24±6 | 18±4 | 40±6 | ||
| stitch | pointmaze-medium-stitch-v0 | 23±18 | 70±14 | 21±9 | 80±12 | 0±1 | 74±6 | 94±6 | |
| pointmaze-large-stitch-v0 | 7±5 | 12±6 | 31±2 | 84±15 | 0±0 | 13±6 | 96±2 | ||
| pointmaze-giant-stitch-v0 | 0±0 | 0±0 | 0±0 | 50±8 | 0±0 | 0±0 | 98±1 | ||
| pointmaze-teleport-stitch-v0 | 31±9 | 44±2 | 25±3 | 9±5 | 4±3 | 34±4 | 13±4 | ||
| antmaze | navigate | antmaze-medium-navigate-v0 | 29±4 | 72±8 | 71±4 | 88±3 | 95±1 | 96±1 | 97±2 |
| antmaze-large-navigate-v0 | 24±2 | 16±5 | 34±4 | 75±6 | 83±4 | 91±2 | 93±5 | ||
| antmaze-giant-navigate-v0 | 0±0 | 0±0 | 0±0 | 14±3 | 16±3 | 65±5 | 69±9 | ||
| antmaze-teleport-navigate-v0 | 26±3 | 39±3 | 35±5 | 35±5 | 53±2 | 42±3 | 45±3 | ||
| stitch | antmaze-medium-stitch-v0 | 45±11 | 44±6 | 29±6 | 59±7 | 53±6 | 94±1 | 93±7 | |
| antmaze-large-stitch-v0 | 3±3 | 18±2 | 7±2 | 18±2 | 11±2 | 67±5 | 95±2 | ||
| antmaze-giant-stitch-v0 | 0±0 | 0±0 | 0±0 | 0±0 | 0±0 | 2±2 | 92±3 | ||
| antmaze-teleport-stitch-v0 | 31±6 | 39±3 | 17±2 | 24±5 | 31±4 | 36±2 | 35±11 | ||
| explore | antmaze-medium-explore-v0 | 2±1 | 19±3 | 13±2 | 1±1 | 3±2 | 37±10 | 100±0 | |
| antmaze-large-explore-v0 | 0±0 | 10±3 | 0±0 | 0±0 | 0±0 | 4±5 | 90±15 | ||
| antmaze-teleport-explore-v0 | 2±1 | 32±2 | 7±3 | 2±2 | 20±2 | 34±15 | 48±6 | ||
| humanoidmaze | navigate | humanoidmaze-medium-navigate-v0 | 8±2 | 24±2 | 27±2 | 21±8 | 60±4 | 89±2 | 89±5 |
| humanoidmaze-large-navigate-v0 | 1±0 | 2±1 | 2±1 | 5±1 | 24±4 | 49±4 | 56±5 | ||
| humanoidmaze-giant-navigate-v0 | 0±0 | 0±0 | 0±0 | 1±0 | 3±2 | 12±4 | 67±11 | ||
| stitch | humanoidmaze-medium-stitch-v0 | 29±5 | 12±2 | 12±3 | 18±2 | 36±2 | 88±2 | 68±5 | |
| humanoidmaze-large-stitch-v0 | 6±3 | 1±1 | 0±0 | 3±1 | 4±1 | 28±3 | 39±6 | ||
| humanoidmaze-giant-stitch-v0 | 0±0 | 0±0 | 0±0 | 0±0 | 0±0 | 3±2 | 62±6 | ||
| visual‑antmaze | navigate | visual-antmaze-medium-navigate-v0 | 11±2 | 22±2 | 11±1 | 0±0 | 94±1 | 93±3 | 94±5 |
| visual-antmaze-large-navigate-v0 | 4±0 | 5±1 | 4±1 | 0±0 | 84±1 | 53±9 | 88±3 | ||
| visual-antmaze-giant-navigate-v0 | 0±1 | 1±1 | 0±0 | 0±0 | 47±2 | 6±4 | 36±7 | ||
| visual-antmaze-teleport-navigate-v0 | 5±1 | 8±1 | 6±1 | 6±3 | 48±2 | 37±2 | 47±3 | ||
| stitch | visual-antmaze-medium-stitch-v0 | 67±4 | 6±2 | 2±0 | 0±0 | 69±2 | 87±2 | 95±2 | |
| visual-antmaze-large-stitch-v0 | 24±3 | 1±1 | 0±0 | 1±1 | 11±3 | 28±0 | 90±2 | ||
| visual-antmaze-giant-stitch-v0 | 0±0 | 0±0 | 0±0 | 0±0 | 0±0 | 0±0 | 55±6 | ||
| visual-antmaze-teleport-stitch-v0 | 32±3 | 1±1 | 1±0 | 1±2 | 32±6 | 37±2 | 21±4 | ||
| explore | visual-antmaze-medium-explore-v0 | 0±0 | 0±0 | 0±0 | 0±0 | 0±0 | 0±0 | 78±10 | |
| visual-antmaze-large-explore-v0 | 0±0 | 1±0 | 0±0 | 0±0 | 1±0 | 0±0 | 26±10 | ||
| visual-antmaze-teleport-explore-v0 | 0±0 | 0±0 | 0±0 | 0±0 | 1±0 | 19±8 | 34±4 | ||
| cube | play | cube-single-play-v0 | 6±2 | 53±4 | 68±6 | 5±1 | 19±2 | 15±3 | 68±6 |
| cube-double-play-v0 | 1±1 | 36±5 | 40±5 | 1±0 | 10±2 | 6±2 | 2±1 | ||
| scene | play | scene-play-v0 | 5±1 | 42±4 | 51±4 | 5±1 | 19±2 | 38±3 | 26±3 |
@inproceedings{shehmar2026alps,
title = {Laplacian Representations for Decision-Time Planning},
author = {Shehmar, Dikshant and Schlegel, Matthew and Taylor, Matthew E.
and Machado, Marlos C.},
booktitle = {International Conference on Machine Learning(ICML)},
year = {2026},
}