TY - GEN
T1 - Online expectation maximization for reinforcement learning in POMDPs
AU - Liu, Miao
AU - Liao, Xuejun
AU - Carin, Lawrence
N1 - Generated from Scopus record by KAUST IRTS on 2021-02-09
PY - 2013/12/1
Y1 - 2013/12/1
N2 - We present online nested expectation maximization for model-free reinforcement learning in a POMDP. The algorithm evaluates the policy only in the current learning episode, discarding the episode after the evaluation and memorizing the sufficient statistic, from which the policy is computed in closedform. As a result, the online algorithm has a time complexity O (n) and a memory complexity O(1), compared to O (n2) and O(n) for the corresponding batch-mode algorithm, where n is the number of learning episodes. The online algorithm, which has a provable convergence, is demonstrated on five benchmark POMDP problems.
AB - We present online nested expectation maximization for model-free reinforcement learning in a POMDP. The algorithm evaluates the policy only in the current learning episode, discarding the episode after the evaluation and memorizing the sufficient statistic, from which the policy is computed in closedform. As a result, the online algorithm has a time complexity O (n) and a memory complexity O(1), compared to O (n2) and O(n) for the corresponding batch-mode algorithm, where n is the number of learning episodes. The online algorithm, which has a provable convergence, is demonstrated on five benchmark POMDP problems.
UR - http://www.scopus.com/inward/record.url?scp=84896062709&partnerID=8YFLogxK
M3 - Conference contribution
SN - 9781577356332
SP - 1501
EP - 1507
BT - IJCAI International Joint Conference on Artificial Intelligence
ER -