TY - GEN
T1 - Restless bandits with switching costs: Linear programming relaxations, performance bounds and limited lookahead policies
AU - Le Ny, Jerome
AU - Feron, Eric
N1 - Generated from Scopus record by KAUST IRTS on 2021-02-18
PY - 2006/12/1
Y1 - 2006/12/1
N2 - The multi-armed bandit problem and one of its most interesting extensions, the restless bandits problem, are frequently encountered in various stochastic control problems. We present a linear programming relaxation for the restless bandits problem with discounted rewards, where only one project can be activated at each period but with additional costs penalizing switching between projects. The relaxation can be efficiently computed and provides a bound on the achievable performance. We describe several heuristic policies; in particular, we show that a policy adapted from the primaldual heuristic of Bertsimas and Niño-Mora [1] for the classical restless bandits problem is in fact equivalent to a one-step lookahead policy; thus, the linear programming relaxation provides a means to compute an approximation of the cost-to-go. Moreover, the approximate cost-to-go is decomposable by project, and this allows the one-step lookahead policy to take the form of an index policy, which can be computed on-line very efficiently. We present numerical experiments, for which we assess the quality of the heuristics using the performance bound. © 2006 IEEE.
AB - The multi-armed bandit problem and one of its most interesting extensions, the restless bandits problem, are frequently encountered in various stochastic control problems. We present a linear programming relaxation for the restless bandits problem with discounted rewards, where only one project can be activated at each period but with additional costs penalizing switching between projects. The relaxation can be efficiently computed and provides a bound on the achievable performance. We describe several heuristic policies; in particular, we show that a policy adapted from the primaldual heuristic of Bertsimas and Niño-Mora [1] for the classical restless bandits problem is in fact equivalent to a one-step lookahead policy; thus, the linear programming relaxation provides a means to compute an approximation of the cost-to-go. Moreover, the approximate cost-to-go is decomposable by project, and this allows the one-step lookahead policy to take the form of an index policy, which can be computed on-line very efficiently. We present numerical experiments, for which we assess the quality of the heuristics using the performance bound. © 2006 IEEE.
UR - http://www.scopus.com/inward/record.url?scp=34047223046&partnerID=8YFLogxK
M3 - Conference contribution
SN - 1424402107
SP - 1587
EP - 1592
BT - Proceedings of the American Control Conference
ER -