Modular value iteration through regional decomposition

Linus Gisslen, Mark Ring, Matthew Luciw, Jürgen Schmidhuber

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Future AGIs will need to solve large reinforcement-learning problems involving complex reward functions having multiple reward sources. One way to make progress on such problems is to decompose them into smaller regions that can be solved efficiently. We introduce a novel modular version of Least Squares Policy Iteration (LSPI), called M-LSPI, which 1. breaks up Markov decision problems (MDPs) into a set of mutually exclusive regions; 2. iteratively solves each region by a single matrix inversion and then combines the solutions by value iteration. The resulting algorithm leverages regional decomposition to efficiently solve the MDP. As the number of states increases, on both structured and unstructured MDPs, M-LSPI yields substantial improvements over traditional algorithms in terms of time to convergence to the value function of the optimal policy, especially as the discount factor approaches one. © 2012 Springer-Verlag.
Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages69-78
Number of pages10
DOIs
StatePublished - Dec 26 2012
Externally publishedYes

Bibliographical note

Generated from Scopus record by KAUST IRTS on 2022-09-14

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Modular value iteration through regional decomposition'. Together they form a unique fingerprint.

Cite this