sac_ae_if/local_dm_control_suite
2023-05-16 12:40:47 +02:00
..
common Adding Natural Noise 2023-05-16 12:40:47 +02:00
demos Adding Natural Noise 2023-05-16 12:40:47 +02:00
tests Adding Natural Noise 2023-05-16 12:40:47 +02:00
utils Adding Natural Noise 2023-05-16 12:40:47 +02:00
wrappers Adding Natural Noise 2023-05-16 12:40:47 +02:00
__init__.py Adding Natural Noise 2023-05-16 12:40:47 +02:00
acrobot.py Adding Natural Noise 2023-05-16 12:40:47 +02:00
acrobot.xml Adding Natural Noise 2023-05-16 12:40:47 +02:00
ball_in_cup.py Adding Natural Noise 2023-05-16 12:40:47 +02:00
ball_in_cup.xml Adding Natural Noise 2023-05-16 12:40:47 +02:00
base.py Adding Natural Noise 2023-05-16 12:40:47 +02:00
cartpole.py Adding Natural Noise 2023-05-16 12:40:47 +02:00
cartpole.xml Adding Natural Noise 2023-05-16 12:40:47 +02:00
cheetah.py Adding Natural Noise 2023-05-16 12:40:47 +02:00
cheetah.xml Adding Natural Noise 2023-05-16 12:40:47 +02:00
explore.py Adding Natural Noise 2023-05-16 12:40:47 +02:00
finger.py Adding Natural Noise 2023-05-16 12:40:47 +02:00
finger.xml Adding Natural Noise 2023-05-16 12:40:47 +02:00
fish.py Adding Natural Noise 2023-05-16 12:40:47 +02:00
fish.xml Adding Natural Noise 2023-05-16 12:40:47 +02:00
hopper.py Adding Natural Noise 2023-05-16 12:40:47 +02:00
hopper.xml Adding Natural Noise 2023-05-16 12:40:47 +02:00
humanoid_CMU.py Adding Natural Noise 2023-05-16 12:40:47 +02:00
humanoid_CMU.xml Adding Natural Noise 2023-05-16 12:40:47 +02:00
humanoid.py Adding Natural Noise 2023-05-16 12:40:47 +02:00
humanoid.xml Adding Natural Noise 2023-05-16 12:40:47 +02:00
lqr_solver.py Adding Natural Noise 2023-05-16 12:40:47 +02:00
lqr.py Adding Natural Noise 2023-05-16 12:40:47 +02:00
lqr.xml Adding Natural Noise 2023-05-16 12:40:47 +02:00
manipulator.py Adding Natural Noise 2023-05-16 12:40:47 +02:00
manipulator.xml Adding Natural Noise 2023-05-16 12:40:47 +02:00
pendulum.py Adding Natural Noise 2023-05-16 12:40:47 +02:00
pendulum.xml Adding Natural Noise 2023-05-16 12:40:47 +02:00
point_mass.py Adding Natural Noise 2023-05-16 12:40:47 +02:00
point_mass.xml Adding Natural Noise 2023-05-16 12:40:47 +02:00
quadruped.py Adding Natural Noise 2023-05-16 12:40:47 +02:00
quadruped.xml Adding Natural Noise 2023-05-16 12:40:47 +02:00
reacher.py Adding Natural Noise 2023-05-16 12:40:47 +02:00
reacher.xml Adding Natural Noise 2023-05-16 12:40:47 +02:00
README.md Adding Natural Noise 2023-05-16 12:40:47 +02:00
stacker.py Adding Natural Noise 2023-05-16 12:40:47 +02:00
stacker.xml Adding Natural Noise 2023-05-16 12:40:47 +02:00
swimmer.py Adding Natural Noise 2023-05-16 12:40:47 +02:00
swimmer.xml Adding Natural Noise 2023-05-16 12:40:47 +02:00
walker.py Adding Natural Noise 2023-05-16 12:40:47 +02:00
walker.xml Adding Natural Noise 2023-05-16 12:40:47 +02:00

DeepMind Control Suite.

This submodule contains the domains and tasks described in the DeepMind Control Suite tech report.

Quickstart

from dm_control import suite
import numpy as np

# Load one task:
env = suite.load(domain_name="cartpole", task_name="swingup")

# Iterate over a task set:
for domain_name, task_name in suite.BENCHMARKING:
  env = suite.load(domain_name, task_name)

# Step through an episode and print out reward, discount and observation.
action_spec = env.action_spec()
time_step = env.reset()
while not time_step.last():
  action = np.random.uniform(action_spec.minimum,
                             action_spec.maximum,
                             size=action_spec.shape)
  time_step = env.step(action)
  print(time_step.reward, time_step.discount, time_step.observation)

Illustration video

Below is a video montage of solved Control Suite tasks, with reward visualisation enabled.

Video montage

Quadruped domain [April 2019]

Roughly based on the 'ant' model introduced by Schulman et al. 2015. Main modifications to the body are:

  • 4 DoFs per leg, 1 constraining tendon.
  • 3 actuators per leg: 'yaw', 'lift', 'extend'.
  • Filtered position actuators with timescale of 100ms.
  • Sensors include an IMU, force/torque sensors, and rangefinders.

Four tasks:

  • walk and run: self-right the body then move forward at a desired speed.
  • escape: escape a bowl-shaped random terrain (uses rangefinders).
  • fetch, go to a moving ball and bring it to a target.

All behaviors in the video below were trained with Abdolmaleki et al's MPO.

Video montage