Accepted to ICML 2026!

Laplacian Representations for Decision-Time Planning

Dikshant Shehmar1,2 · Matthew Schlegel3 · Matthew E. Taylor1,2,4 · Marlos C. Machado1,2,4

1University of Alberta    2Alberta Machine Intelligence Institute (Amii)    3University of Calgary    4Canada CIFAR AI Chair

Teaser coming soon...

ALPS (Augmented Laplacian Planning with Subgoals) is a hierarchical planning algorithm that uses the Laplacian representation for subgoal identification and distance estimation. K-means clustering over the Laplacian embeddings produces a cluster graph that captures the topological structure of the state space. A high-level planner uses Dijkstra's algorithm to propose subgoals, while a low-level planner uses the learned forward model and behavior prior to efficiently reach these subgoals.

Abstract

Planning with a learned model remains a key challenge in model-based reinforcement learning due to the compounding error problem. In decision-time planning, state representations are critical as they must support local cost computation while preserving long-horizon temporal structure. In this paper, we show that the Laplacian representation provides an effective latent space for planning by capturing state-space distances at multiple time scales. The Laplacian representation preserves meaningful distances and naturally decomposes long-horizon problems into subgoals, thus mitigate the compounding errors that arise over long prediction horizons. Building on these properties, we introduce ALPS, a hierarchical planning algorithm, and demonstrate that it outperforms commonly used baselines on a selection of offline goal-conditioned RL tasks from OGBench, a benchmark previously dominated by model-free methods.

Method

ALPS method overview

Overview of ALPS: In pre-training, ALPS (1) learns the Laplacian representation using the Augmented Lagrangian Laplacian Objective (ALLO), (2) learns a one-step forward model on top of the original state space, (3) learns the behavior prior πprior using the scaled Laplacian representation, and (4) clusters the dataset using k-means in the scaled Laplacian space to generate the cluster graph. In planning, (5) ALPS takes the current state from the environment and uses the high-level planner to determine the next subgoal, then determines the next action towards this subgoal using the low-level planner.

CTD distances and spectral clustering in ψ-space

Properties of Laplacian representation: (Left) The scaled Laplacian representation (ψ) is isometric to the commute-time distance in the data graph, so distances in ψ-space reflect how hard it is to navigate between states. (Right) k-means clustering in ψ-space groups states into topologically connected regions used as subgoal waypoints.

Videos

point-giant-stitch
ant-giant-stitch
ant-large-explore
humanoid-medium-navigate
visual-ant-large-navigate
cube-single-play

Results

Success rate (%) on each of the locomotion and manipulation tasks from OGBench considered. The results are averaged over 8 seeds (4 seeds for pixel-based tasks), and we report the standard deviation after the ± sign. ALPS outperforms the model-free GCRL algorithms with p < 0.001 using a Holm-Bonferroni-corrected two-sided Wilcoxon signed-rank test. Values in bold denote the largest mean in each row as a visual aid.

Environment Dataset Type Dataset GCBCGCIVLGCIQLQRLCRLHIQLALPS
pointmaze navigate pointmaze-medium-navigate-v0 9±663±653±882±529±779±582±10
pointmaze-large-navigate-v0 29±645±534±386±939±758±580±8
pointmaze-giant-navigate-v0 1±20±00±068±727±1046±967±11
pointmaze-teleport-navigate-v0 25±345±324±74±424±618±440±6
stitch pointmaze-medium-stitch-v0 23±1870±1421±980±120±174±694±6
pointmaze-large-stitch-v0 7±512±631±284±150±013±696±2
pointmaze-giant-stitch-v0 0±00±00±050±80±00±098±1
pointmaze-teleport-stitch-v0 31±944±225±39±54±334±413±4
antmaze navigate antmaze-medium-navigate-v0 29±472±871±488±395±196±197±2
antmaze-large-navigate-v0 24±216±534±475±683±491±293±5
antmaze-giant-navigate-v0 0±00±00±014±316±365±569±9
antmaze-teleport-navigate-v0 26±339±335±535±553±242±345±3
stitch antmaze-medium-stitch-v0 45±1144±629±659±753±694±193±7
antmaze-large-stitch-v0 3±318±27±218±211±267±595±2
antmaze-giant-stitch-v0 0±00±00±00±00±02±292±3
antmaze-teleport-stitch-v0 31±639±317±224±531±436±235±11
explore antmaze-medium-explore-v0 2±119±313±21±13±237±10100±0
antmaze-large-explore-v0 0±010±30±00±00±04±590±15
antmaze-teleport-explore-v0 2±132±27±32±220±234±1548±6
humanoidmaze navigate humanoidmaze-medium-navigate-v0 8±224±227±221±860±489±289±5
humanoidmaze-large-navigate-v0 1±02±12±15±124±449±456±5
humanoidmaze-giant-navigate-v0 0±00±00±01±03±212±467±11
stitch humanoidmaze-medium-stitch-v0 29±512±212±318±236±288±268±5
humanoidmaze-large-stitch-v0 6±31±10±03±14±128±339±6
humanoidmaze-giant-stitch-v0 0±00±00±00±00±03±262±6
visual‑antmaze navigate visual-antmaze-medium-navigate-v0 11±222±211±10±094±193±394±5
visual-antmaze-large-navigate-v0 4±05±14±10±084±153±988±3
visual-antmaze-giant-navigate-v0 0±11±10±00±047±26±436±7
visual-antmaze-teleport-navigate-v0 5±18±16±16±348±237±247±3
stitch visual-antmaze-medium-stitch-v0 67±46±22±00±069±287±295±2
visual-antmaze-large-stitch-v0 24±31±10±01±111±328±090±2
visual-antmaze-giant-stitch-v0 0±00±00±00±00±00±055±6
visual-antmaze-teleport-stitch-v0 32±31±11±01±232±637±221±4
explore visual-antmaze-medium-explore-v0 0±00±00±00±00±00±078±10
visual-antmaze-large-explore-v0 0±01±00±00±01±00±026±10
visual-antmaze-teleport-explore-v0 0±00±00±00±01±019±834±4
cube play cube-single-play-v0 6±253±468±65±119±215±368±6
cube-double-play-v0 1±136±540±51±010±26±22±1
scene play scene-play-v0 5±142±451±45±119±238±326±3

BibTeX

@inproceedings{shehmar2026alps,
  title     = {Laplacian Representations for Decision-Time Planning},
  author    = {Shehmar, Dikshant and Schlegel, Matthew and Taylor, Matthew E.
               and Machado, Marlos C.},
  booktitle = {International Conference on Machine Learning(ICML)},
  year      = {2026},
}