diff --git a/README.md b/README.md index 52b53a9..f056c40 100644 --- a/README.md +++ b/README.md @@ -1,25 +1,3 @@ -# SAC+AE implementation in PyTorch - -This is PyTorch implementation of SAC+AE from - -**Improving Sample Efficiency in Model-Free Reinforcement Learning from Images** by - -[Denis Yarats](https://cs.nyu.edu/~dy1042/), [Amy Zhang](https://mila.quebec/en/person/amy-zhang/), [Ilya Kostrikov](https://github.com/ikostrikov), [Brandon Amos](http://bamos.github.io/), [Joelle Pineau](https://www.cs.mcgill.ca/~jpineau/), [Rob Fergus](https://cs.nyu.edu/~fergus/pmwiki/pmwiki.php). - -[[Paper]](https://arxiv.org/abs/1910.01741) [[Webpage]](https://sites.google.com/view/sac-ae/home) - -## Citation -If you use this repo in your research, please consider citing the paper as follows -``` -@article{yarats2019improving, - title={Improving Sample Efficiency in Model-Free Reinforcement Learning from Images}, - author={Denis Yarats and Amy Zhang and Ilya Kostrikov and Brandon Amos and Joelle Pineau and Rob Fergus}, - year={2019}, - eprint={1910.01741}, - archivePrefix={arXiv} -} -``` - ## Requirements We assume you have access to a gpu that can run CUDA 9.2. Then, the simplest way to install all required dependencies is to create an anaconda environment by running: ``` @@ -31,7 +9,7 @@ source activate pytorch_sac_ae ``` ## Instructions -To train an SAC+AE agent on the `cheetah run` task from image-based observations run: +To train an agent on the `cheetah run` task from image-based observations run: ``` python train.py \ --domain_name cheetah \ @@ -42,7 +20,9 @@ python train.py \ --save_video \ --save_tb \ --work_dir ./log \ - --seed 1 + --seed 1 \ + --img_source video \ + --resource_files "/media/vedant/cpsDataStorageWK/Vedant/train/*.mp4" ``` This will produce 'log' folder, where all the outputs are going to be stored including train/eval logs, tensorboard blobs, and evaluation episode videos. One can attacha tensorboard to monitor training by running: ``` @@ -71,9 +51,3 @@ while an evaluation entry: | eval | S: 0 | ER: 21.1676 ``` which just tells the expected reward `ER` evaluating current policy after `S` steps. Note that `ER` is average evaluation performance over `num_eval_episodes` episodes (usually 10). - -## Results -Our method demonstrates significantly improved performance over the baseline SAC:pixel. It matches the state-of-the-art performance of model-based algorithms, such as PlaNet (Hafner et al., 2018) and SLAC (Lee et al., 2019), as well -as a model-free algorithm D4PG (Barth-Maron et al., 2018), that also learns from raw images. Our -algorithm exhibits stable learning across ten random seeds and is extremely easy to implement. -![Results](results/graph.png)