Dreamer | ||
DreamerV2 | ||
imgs | ||
.gitignore | ||
README.md |
Learning Task Informed Abstractions (TIA)
Left to right: Raw Observation, Dreamer, Joint of TIA, Task Stream of TIA, Distractor Stream of TIA
This code base contains a minimal modification over Dreamer/DreamerV2 to learn disentangled world models, presented in:
Learning Task Informed Abstractions
Xiang Fu*, Ge Yang*, Pulkit Agrawal, Tommi Jaakkola
The directory Dreamer contains code for running DMC experiments. The directory DreamerV2 contains code for running Atari experiments. This implementation is tested with Python 3.6, Tensorflow 2.3.1 and CUDA 10.1. The training/evaluation metrics used for producing the figures in the paper can be downloaded from this Google Drive link.
Getting started
Get dependencies:
pip3 install --user tensorflow-gpu==2.3.1
pip3 install --user tensorflow_probability==0.11.0
pip3 install --user gym
pip3 install --user pandas
pip3 install --user matplotlib
pip3 install --user ruamel.yaml
pip3 install --user scikit-image
pip3 install --user git+git://github.com/deepmind/dm_control.git
pip3 install --user 'gym[atari]'
You will need an active Mujoco license for running DMC experiments.
Running DMC experiments with distracting background
Code for running DMC experiments is under the directory Dreamer.
To run DMC experiments with distracting video backgrounds, you can download a small set of 16 videos (videos with names starting with ''A'' in the Kinetics 400 dataset's driving_car
class) from this Google Drive link, which is used for producing Figure 9(a) in the paper's appendix.
To replicate the setup of DBC and use more background videos, first download the Kinetics 400 dataset and grab the driving_car
label from the train dataset. Use the repo:
https://github.com/Showmax/kinetics-downloader
to download the dataset.
Train the agent:
python run.py --method dreamer --configs dmc --task dmc_cheetah_run_driving --logdir ~/logdir --video_dir VIDPATH
VIDPATH
should contains *.mp4
video files. (if you used the above repo to download the Kinetics videos, you should set VIDPATH
to PATH_TO_REPO/kinetics-downloader/dataset/train/driving_car
)
Choose method from:
[dreamer, tia, inverse]
corresponding to the original Dreamer, TIA, and representation learned with an inverse model as described in Section 4.2 of the paper.
Choose environment + distraction (e.g. dmc_cheetah_run_driving
):
dmc_{domain}_{task}_{distraction}
where {domain} (e.g., cheetah, walker, hopper, etc.) and {task} (e.g., run, walk, stand, etc.) are from the DeepMind Control Suite, and distraction can be chosen from:
[none, noise, driving]
where each option uses different backgrounds:
none: default (no) background
noise: white noise background
driving: natural videos from the ''driving car'' class as background
Running Atari experiments
Code for running Atari experiments is under the directory DreamerV2.
Train the agent with the game Demon Attack:
python dreamer.py --logdir ~/logdir/atari_demon_attack/TIA/1 \
--configs defaults atari --task atari_demon_attack
Monitoring results
Both DMC and Atari experiments log with tensorboard by default. The decomposition of the two streams of TIA is visualized in .gif
animation. Access tensorboard with the command:
tensorboard --logdir LOGDIR
Citation
If you find this code useful, please consider citing our paper:
@InProceedings{fu2021learning,
title = {Learning Task Informed Abstractions},
author = {Fu, Xiang and Yang, Ge and Agrawal, Pulkit and Jaakkola, Tommi},
booktitle = {Proceedings of the 38th International Conference on Machine Learning},
pages = {3480--3491},
year = {2021},
editor = {Meila, Marina and Zhang, Tong},
volume = {139},
series = {Proceedings of Machine Learning Research},
month = {18--24 Jul},
publisher = {PMLR},
pdf = {http://proceedings.mlr.press/v139/fu21b/fu21b.pdf},
url = {http://proceedings.mlr.press/v139/fu21b.html}
}
Reference
We modify Dreamer for DMC environments and DreamerV2 for Atari games. Thanks Danijar for releasing his very clean implementation! Utilities such as
- Logging with Tensorboard/JSON line files
- debugging with the
debug
flag - mixed precision training
are the same as in the respective original implementations.