Solving DEC-POMDPs by expectation maximization of value functions

Zhao Song, Xuejun Liao, Lawrence Carin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

We present a new algorithm called PIEM to approximately solve for the policy of an infinite-horizon decentralized partially observable Markov decision process (DEC-POMDP). The algorithm uses expectation maximization (EM) only in the step of policy improvement, with policy evaluation achieved by solving the Bellman's equation in terms of finite state controllers (FSCs). This marks a key distinction of PIEM from the previous EM algorithm of (Kumar and Zilberstein, 2010), i.e., PIEM directly operates on a DEC-POMDP without transforming it into a mixture of dynamic Bayes nets. Thus, PIEM precisely maximizes the value function, avoiding complicated forward/backward message passing arid the corresponding computational and memory cost. To overcome local optima, we follow (Pa-jarinen and Peltonen, 2011) to solve the DEC-POMDP for a finite length horizon and use the resulting policy graph to initialize the FSCs. We solve the finite-horizon problem using a modified point-based policy generation (PBPG) algorithm, in which a closed-form solution is provided which was previously found by linear programming in the original PBPG. Experimental results on benchmark problems show that the proposed algorithms compare favorably to state-of-the-art methods.
Original languageEnglish (US)
Title of host publicationAAAI Spring Symposium - Technical Report
PublisherAI Access [email protected]
Pages68-76
Number of pages9
ISBN (Print)9781577357544
StatePublished - Jan 1 2016
Externally publishedYes

Bibliographical note

Generated from Scopus record by KAUST IRTS on 2021-02-09

Fingerprint

Dive into the research topics of 'Solving DEC-POMDPs by expectation maximization of value functions'. Together they form a unique fingerprint.

Cite this