Speeding up q(λ)-learning

Marco Wiering, Jurgen Schmidhuber

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Q(λ)-learning uses TD(λ)-methods to accelerate Q-learning. The worst case complexity for a single update step of previous online Q(λ) implementations based on lookup-tables is bounded by the size of the state/action space. Our faster algorithm's worst case complexity is bounded by the number of actions. The algorithm is based on the observation that Q-value updates may be postponed until they are needed.
Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages352-363
Number of pages12
ISBN (Print)3540644172
DOIs
StatePublished - Jan 1 1998
Externally publishedYes

Bibliographical note

Generated from Scopus record by KAUST IRTS on 2022-09-14

Fingerprint

Dive into the research topics of 'Speeding up q(λ)-learning'. Together they form a unique fingerprint.

Cite this