Multi-agent task assignment in the bandit framework

Jerome Le Ny, Munther Dahleh, Eric Feron

Research output: Chapter in Book/Report/Conference proceedingConference contribution

17 Scopus citations


We consider a task assignment problem for a fleet of UAVs in a surveillance/search mission. We formulate the problem as a restless bandits problem with switching costs and discounted rewards: there are N sites to inspect, each one of them evolving as a Markov chain, with different transition probabilities if the site is inspected or not. The sites evolve independently of each other, there are transition costs cy for moving between sites i and j ∈ {1, . . . ,N}, rewards when visiting the sites, and we maximize a mixed objective function of these costs and rewards. This problem is known to be PSPACE-hard. We present a systematic method, inspired from the work of Bertsimas and Ñiño-Mora [1] on restless bandits, for deriving a linear programming relaxation for such locally decomposable MDPs. The relaxation is computable in polynomial-time offline, provides a bound on the achievable performance, as well as an approximation of the cost-to-go which can be used online in conjunction with standard suboptimal stochastic control methods. In particular, the one-step lookahead policy based on this approximate cost-to-go reduces to computing the optimal value of a linear assignment problem of size N. We present numerical experiments, for which we assess the quality of the heuristics using the performance bound. © 2006 IEEE.
Original languageEnglish (US)
Title of host publicationProceedings of the IEEE Conference on Decision and Control
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages6
ISBN (Print)1424401712
StatePublished - Jan 1 2006
Externally publishedYes

Bibliographical note

Generated from Scopus record by KAUST IRTS on 2021-02-18


Dive into the research topics of 'Multi-agent task assignment in the bandit framework'. Together they form a unique fingerprint.

Cite this