We use reinforcement learning (RL) to compute strategies for multiagent soccer teams. RL may profit significantly from world models (WMs) estimating state transition probabilities and rewards. In high-dimensional, continuous input spaces, however, learning accurate WMs is intractable. Here we show that incomplete WMs can help to quickly find good action selection policies. Our approach is based on a novel combination of CMACs and prioritized sweeping-like algorithms. Variants thereof outperform both Q(λ)-learning with CMACs and the evolutionary method Probabilistic Incremental Program Evolution (PIPE) which performed best in previous comparisons.
ASJC Scopus subject areas
- Artificial Intelligence