Abstract
Q(λ)-learning uses TD(λ)-methods to accelerate Q-learning. The update complexity of previous online Q(λ) implementations based on lookup tables is bounded by the size of the state/action space. Our faster algorithm's update complexity is bounded by the number of actions. The method is based on the observation that Q-value updates may be postponed until they are needed.
Original language | English (US) |
---|---|
Pages (from-to) | 105-115 |
Number of pages | 11 |
Journal | Machine Learning |
Volume | 33 |
Issue number | 1 |
DOIs | |
State | Published - Jan 1 1998 |
Externally published | Yes |
Bibliographical note
Generated from Scopus record by KAUST IRTS on 2022-09-14ASJC Scopus subject areas
- Artificial Intelligence
- Software