Domain Adversarial Reinforcement Learning

Bonnie Li , Vincent François-Lavet , Thang Doan , Joelle Pineau


We consider the problem of generalization in reinforcement learning where visual aspects of the observations might differ, e.g. when there are different backgrounds. This class of problems is referred to as block MDPs and we assume that our agent has access to only a few of the MDPs from the block MDP distribution during training. The performance of the agent is then reported on new unknown test domains taken from the distribution (e.g. unseen backgrounds). For this "zero-shot RL" task, we enforce the learning of a distribution of representations that is invariant to the specific training domains via a domain adversarial component that modifies the weights of a shared encoder. We empirically show that it allows achieving a significant generalization improvement to new unseen domains.