ALPS - Augmented Laplacian Planning with Subgoals

Abstract

Planning with a learned model remains a key challenge in model-based reinforcement learning due to the compounding error problem. In decision-time planning, state representations are critical as they must support local cost computation while preserving long-horizon temporal structure. In this paper, we show that the Laplacian representation provides an effective latent space for planning by capturing state-space distances at multiple time scales. The Laplacian representation preserves meaningful distances and naturally decomposes long-horizon problems into subgoals, thus mitigate the compounding errors that arise over long prediction horizons. Building on these properties, we introduce ALPS, a hierarchical planning algorithm, and demonstrate that it outperforms commonly used baselines on a selection of offline goal-conditioned RL tasks from OGBench, a benchmark previously dominated by model-free methods.

Method

Overview of ALPS: In pre-training, ALPS (1) learns the Laplacian representation using the Augmented Lagrangian Laplacian Objective (ALLO), (2) learns a one-step forward model on top of the original state space, (3) learns the behavior prior π_prior using the scaled Laplacian representation, and (4) clusters the dataset using k-means in the scaled Laplacian space to generate the cluster graph. In planning, (5) ALPS takes the current state from the environment and uses the high-level planner to determine the next subgoal, then determines the next action towards this subgoal using the low-level planner.

CTD distances and spectral clustering in ψ-space

Properties of Laplacian representation: (Left) The scaled Laplacian representation (ψ) is isometric to the commute-time distance in the data graph, so distances in ψ-space reflect how hard it is to navigate between states. (Right) k-means clustering in ψ-space groups states into topologically connected regions used as subgoal waypoints.

Videos

point-giant-stitch

ant-giant-stitch

ant-large-explore

humanoid-medium-navigate

visual-ant-large-navigate

cube-single-play

Results

Success rate (%) on each of the locomotion and manipulation tasks from OGBench considered. The results are averaged over 8 seeds (4 seeds for pixel-based tasks), and we report the standard deviation after the ± sign. ALPS outperforms the model-free GCRL algorithms with p < 0.001 using a Holm-Bonferroni-corrected two-sided Wilcoxon signed-rank test. Values in bold denote the largest mean in each row as a visual aid.

Environment	Dataset Type	Dataset	GCBC	GCIVL	GCIQL	QRL	CRL	HIQL	ALPS
pointmaze	navigate	pointmaze-medium-navigate-v0	9±6	63±6	53±8	82±5	29±7	79±5	82±10
		pointmaze-large-navigate-v0	29±6	45±5	34±3	86±9	39±7	58±5	80±8
		pointmaze-giant-navigate-v0	1±2	0±0	0±0	68±7	27±10	46±9	67±11
		pointmaze-teleport-navigate-v0	25±3	45±3	24±7	4±4	24±6	18±4	40±6
	stitch	pointmaze-medium-stitch-v0	23±18	70±14	21±9	80±12	0±1	74±6	94±6
		pointmaze-large-stitch-v0	7±5	12±6	31±2	84±15	0±0	13±6	96±2
		pointmaze-giant-stitch-v0	0±0	0±0	0±0	50±8	0±0	0±0	98±1
		pointmaze-teleport-stitch-v0	31±9	44±2	25±3	9±5	4±3	34±4	13±4
antmaze	navigate	antmaze-medium-navigate-v0	29±4	72±8	71±4	88±3	95±1	96±1	97±2
		antmaze-large-navigate-v0	24±2	16±5	34±4	75±6	83±4	91±2	93±5
		antmaze-giant-navigate-v0	0±0	0±0	0±0	14±3	16±3	65±5	69±9
		antmaze-teleport-navigate-v0	26±3	39±3	35±5	35±5	53±2	42±3	45±3
	stitch	antmaze-medium-stitch-v0	45±11	44±6	29±6	59±7	53±6	94±1	93±7
		antmaze-large-stitch-v0	3±3	18±2	7±2	18±2	11±2	67±5	95±2
		antmaze-giant-stitch-v0	0±0	0±0	0±0	0±0	0±0	2±2	92±3
		antmaze-teleport-stitch-v0	31±6	39±3	17±2	24±5	31±4	36±2	35±11
	explore	antmaze-medium-explore-v0	2±1	19±3	13±2	1±1	3±2	37±10	100±0
		antmaze-large-explore-v0	0±0	10±3	0±0	0±0	0±0	4±5	90±15
		antmaze-teleport-explore-v0	2±1	32±2	7±3	2±2	20±2	34±15	48±6
humanoidmaze	navigate	humanoidmaze-medium-navigate-v0	8±2	24±2	27±2	21±8	60±4	89±2	89±5
		humanoidmaze-large-navigate-v0	1±0	2±1	2±1	5±1	24±4	49±4	56±5
		humanoidmaze-giant-navigate-v0	0±0	0±0	0±0	1±0	3±2	12±4	67±11
	stitch	humanoidmaze-medium-stitch-v0	29±5	12±2	12±3	18±2	36±2	88±2	68±5
		humanoidmaze-large-stitch-v0	6±3	1±1	0±0	3±1	4±1	28±3	39±6
		humanoidmaze-giant-stitch-v0	0±0	0±0	0±0	0±0	0±0	3±2	62±6
visual‑antmaze	navigate	visual-antmaze-medium-navigate-v0	11±2	22±2	11±1	0±0	94±1	93±3	94±5
		visual-antmaze-large-navigate-v0	4±0	5±1	4±1	0±0	84±1	53±9	88±3
		visual-antmaze-giant-navigate-v0	0±1	1±1	0±0	0±0	47±2	6±4	36±7
		visual-antmaze-teleport-navigate-v0	5±1	8±1	6±1	6±3	48±2	37±2	47±3
	stitch	visual-antmaze-medium-stitch-v0	67±4	6±2	2±0	0±0	69±2	87±2	95±2
		visual-antmaze-large-stitch-v0	24±3	1±1	0±0	1±1	11±3	28±0	90±2
		visual-antmaze-giant-stitch-v0	0±0	0±0	0±0	0±0	0±0	0±0	55±6
		visual-antmaze-teleport-stitch-v0	32±3	1±1	1±0	1±2	32±6	37±2	21±4
	explore	visual-antmaze-medium-explore-v0	0±0	0±0	0±0	0±0	0±0	0±0	78±10
		visual-antmaze-large-explore-v0	0±0	1±0	0±0	0±0	1±0	0±0	26±10
		visual-antmaze-teleport-explore-v0	0±0	0±0	0±0	0±0	1±0	19±8	34±4
cube	play	cube-single-play-v0	6±2	53±4	68±6	5±1	19±2	15±3	68±6
cube	play	cube-double-play-v0	1±1	36±5	40±5	1±0	10±2	6±2	2±1
scene	play	scene-play-v0	5±1	42±4	51±4	5±1	19±2	38±3	26±3

BibTeX

@inproceedings{shehmar2026alps,
  title     = {Laplacian Representations for Decision-Time Planning},
  author    = {Shehmar, Dikshant and Schlegel, Matthew and Taylor, Matthew E.
               and Machado, Marlos C.},
  booktitle = {International Conference on Machine Learning(ICML)},
  year      = {2026},
}