diff --git a/README.md b/README.md index 52b53a9..f056c40 100644 --- a/README.md +++ b/README.md @@ -1,25 +1,3 @@ -# SAC+AE implementation in PyTorch - -This is PyTorch implementation of SAC+AE from - -**Improving Sample Efficiency in Model-Free Reinforcement Learning from Images** by - -[Denis Yarats](https://cs.nyu.edu/~dy1042/), [Amy Zhang](https://mila.quebec/en/person/amy-zhang/), [Ilya Kostrikov](https://github.com/ikostrikov), [Brandon Amos](http://bamos.github.io/), [Joelle Pineau](https://www.cs.mcgill.ca/~jpineau/), [Rob Fergus](https://cs.nyu.edu/~fergus/pmwiki/pmwiki.php). - -[[Paper]](https://arxiv.org/abs/1910.01741) [[Webpage]](https://sites.google.com/view/sac-ae/home) - -## Citation -If you use this repo in your research, please consider citing the paper as follows -``` -@article{yarats2019improving, - title={Improving Sample Efficiency in Model-Free Reinforcement Learning from Images}, - author={Denis Yarats and Amy Zhang and Ilya Kostrikov and Brandon Amos and Joelle Pineau and Rob Fergus}, - year={2019}, - eprint={1910.01741}, - archivePrefix={arXiv} -} -``` - ## Requirements We assume you have access to a gpu that can run CUDA 9.2. Then, the simplest way to install all required dependencies is to create an anaconda environment by running: ``` @@ -31,7 +9,7 @@ source activate pytorch_sac_ae ``` ## Instructions -To train an SAC+AE agent on the `cheetah run` task from image-based observations run: +To train an agent on the `cheetah run` task from image-based observations run: ``` python train.py \ --domain_name cheetah \ @@ -42,7 +20,9 @@ python train.py \ --save_video \ --save_tb \ --work_dir ./log \ - --seed 1 + --seed 1 \ + --img_source video \ + --resource_files "/media/vedant/cpsDataStorageWK/Vedant/train/*.mp4" ``` This will produce 'log' folder, where all the outputs are going to be stored including train/eval logs, tensorboard blobs, and evaluation episode videos. One can attacha tensorboard to monitor training by running: ``` @@ -71,9 +51,3 @@ while an evaluation entry: | eval | S: 0 | ER: 21.1676 ``` which just tells the expected reward `ER` evaluating current policy after `S` steps. Note that `ER` is average evaluation performance over `num_eval_episodes` episodes (usually 10). - -## Results -Our method demonstrates significantly improved performance over the baseline SAC:pixel. It matches the state-of-the-art performance of model-based algorithms, such as PlaNet (Hafner et al., 2018) and SLAC (Lee et al., 2019), as well -as a model-free algorithm D4PG (Barth-Maron et al., 2018), that also learns from raw images. Our -algorithm exhibits stable learning across ten random seeds and is extremely easy to implement. -![Results](results/graph.png) diff --git a/conda_env.yml b/conda_env.yml index 5f0ae94..c4993ca 100644 --- a/conda_env.yml +++ b/conda_env.yml @@ -1,17 +1,20 @@ -name: pytorch_sac_ae +name: pytorch_sac_ae2 channels: - defaults dependencies: - - python=3.6 + - python=3.7 - pytorch - torchvision - cudatoolkit=9.2 - absl-py - - pyparsing + - pyparsing < 3.0.0 - pip: - termcolor - - git+git://github.com/deepmind/dm_control.git - - git+git://github.com/denisyarats/dmc2gym.git + - git+https://github.com/deepmind/dm_control.git + - git+https://github.com/denisyarats/dmc2gym.git - tb-nightly - imageio - imageio-ffmpeg + - scikit-image + - scikit-video + - opencv-python \ No newline at end of file diff --git a/encoder.py b/encoder.py index b51b784..fd4f7c9 100644 --- a/encoder.py +++ b/encoder.py @@ -109,7 +109,10 @@ class PixelEncoder(nn.Module): out_dim = OUT_DIM[num_layers] self.fc = nn.Linear(num_filters * out_dim * out_dim, self.feature_dim * 2) self.ln = nn.LayerNorm(self.feature_dim * 2) +<<<<<<< HEAD self.combine = nn.Linear(self.feature_dim + 6, self.feature_dim) +======= +>>>>>>> origin/tester_1 self.outputs = dict() @@ -154,7 +157,11 @@ class PixelEncoder(nn.Module): out = self.reparameterize(mu, logstd) self.outputs['tanh'] = out +<<<<<<< HEAD return out, mu, logstd +======= + return out +>>>>>>> origin/tester_1 def copy_conv_weights_from(self, source): """Tie convolutional layers""" diff --git a/logger.py b/logger.py index 3a2adda..54c1142 100644 --- a/logger.py +++ b/logger.py @@ -163,4 +163,4 @@ class Logger(object): def dump(self, step): self._train_mg.dump(step, 'train') - self._eval_mg.dump(step, 'eval') + self._eval_mg.dump(step, 'eval') \ No newline at end of file diff --git a/train.py b/train.py index 75314b5..22d01a7 100644 --- a/train.py +++ b/train.py @@ -28,7 +28,10 @@ def parse_args(): parser.add_argument('--frame_stack', default=3, type=int) parser.add_argument('--img_source', default=None, type=str, choices=['color', 'noise', 'images', 'video', 'none']) parser.add_argument('--resource_files', type=str) +<<<<<<< HEAD parser.add_argument('--resource_files_test', type=str) +======= +>>>>>>> origin/tester_1 parser.add_argument('--total_frames', default=10000, type=int) # replay buffer parser.add_argument('--replay_buffer_capacity', default=100000, type=int)