Abstract
Q(λ)-learning uses TD(λ)-methods to accelerate Q-learning. The worst case complexity for a single update step of previous online Q(λ) implementations based on lookup-tables is bounded by the size of the state/action space. Our faster algorithm's worst case complexity is bounded by the number of actions. The algorithm is based on the observation that Q-value updates may be postponed until they are needed.
Original language | English (US) |
---|---|
Title of host publication | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
Publisher | Springer Verlag |
Pages | 352-363 |
Number of pages | 12 |
ISBN (Print) | 3540644172 |
DOIs | |
State | Published - Jan 1 1998 |
Externally published | Yes |