Abstract
In a repeated zero-sum game, two players repeatedly play the same zero-sum game over several stages. We assume that while both players can observe the actions of the other, only one player knows the actual game, which was randomly selected from a set of possible games according to a known distribution. The dilemma faced by the informed player is how to trade off the short-term reward versus long-term consequence of exploiting information, since exploitation also risks revelation. Classic work by Aumann and Maschler derives the recursive value equation, which quantifies this tradeoff and derives a formula for optimal policies by the informed player. However, using this model for explicit computations can be computationally prohibitive as the number of game stages increases. In this paper, we derive a suboptimal policy based on the concept of policy improvement. The baseline policy is a non-revealing policy, i.e., one that completely ignores superior information. The improved policy, which is implemented in a receding horizon manner, strategizes for the current stage while assuming a non-revealing policy for future stages. We show that the improved policy can be computed by solving a linear program, and the computational complexity of this linear program is constant with respect to the length of the game. We derive bounds on the guaranteed performance of the improved policy and establish that the bounds are tight.
Original language | English (US) |
---|---|
Article number | 6426062 |
Pages (from-to) | 7752-7757 |
Number of pages | 6 |
Journal | Proceedings of the IEEE Conference on Decision and Control |
DOIs | |
State | Published - 2012 |
Externally published | Yes |
Event | 51st IEEE Conference on Decision and Control, CDC 2012 - Maui, HI, United States Duration: Dec 10 2012 → Dec 13 2012 |
ASJC Scopus subject areas
- Control and Systems Engineering
- Modeling and Simulation
- Control and Optimization