Motion forecasting is essential for making intelligent decisions in robotic navigation. As a result, the multi-agent behavioral prediction has become a core component of modern human-robot interaction applications such as autonomous driving. Due to various intentions and interactions among agents, agent trajectories can have multiple possible futures. Hence, the motion forecasting model's ability to cover possible modes becomes essential to enable accurate prediction. Towards this goal, we introduce HalentNet to better model the future motion distribution in addition to a traditional trajectory regression learning objective by incorporating generative augmentation losses. We model intents with unsupervised discrete random variables whose training is guided by a collaboration between two key signals: A discriminative loss that encourages intents' diversity and a hallucinative loss that explores intent transitions (i.e., mixed intents) and encourages their smoothness. This regulates the neural network behavior to be more accurately predictive on uncertain scenarios due to the active yet careful exploration of possible future agent behavior. Our model's learned representation leads to better and more semantically meaningful coverage of the trajectory distribution. Our experiments show that our method can improve over the state-of-the-art trajectory forecasting benchmarks, including vehicles and pedestrians, for about 20% on average FDE and 50% on road boundary violation rate when predicting 6 seconds future. We also conducted human experiments to show that our predicted trajectories received 39.6% more votes than the runner-up approach and 32.2% more votes than our variant without hallucinative mixed intent loss.
|Original language||English (US)|
|State||Published - 2021|
|Event||9th International Conference on Learning Representations, ICLR 2021 - Virtual, Online|
Duration: May 3 2021 → May 7 2021
|Conference||9th International Conference on Learning Representations, ICLR 2021|
|Period||05/3/21 → 05/7/21|
Bibliographical noteFunding Information:
This work is funded by a KAUST BAS/1/1685-01-0. The authors wish to thank Amazon Mechanical Turkers without who helped with our human studies.
© 2021 ICLR 2021 - 9th International Conference on Learning Representations. All rights reserved.
ASJC Scopus subject areas
- Language and Linguistics
- Computer Science Applications
- Linguistics and Language