Getting Started with ODRL¶
In this section, we provide guides on how to create an environment with dynamics shift and run the corresponding off-dynamics RL algorithm with ODRL.
Supported Tasks¶
Now ODRL supports the following tasks:
| Task Domain | Friction | Gravity | Kinematic | Morphology | Map Layout | Offline Datasets |
|---|---|---|---|---|---|---|
| Locomotion | ✅ | ✅ | ✅ | ✅ | ❎ | ✅ |
| Navigation | ❎ | ❎ | ❎ | ❎ | ✅ | ✅ |
| Dexterous Manipulation | ❎ | ❎ | ✅ | ✅ | ❎ | ✅ |
Experimental Settings¶
ODRL contains the following experiemental settings:
- Online-Online setting (online source domain and online target domain)
- Offline-Online setting (offline source domain and online target domain)
- Online-Offline setting (online source domain and offline target domain)
- Offline-Offline setting (offline source domain and offline target domain)
Hello World¶
ODRL supports MuJoCo, AntMaze, Adroit and Sawyer tasks. We provide a function to call environments from these domains, and they share similar way of usage:
from envs.mujoco.call_mujoco_env import call_mujoco_env
from envs.adroit.call_adroit_env import call_adroit_env
from envs.antmaze.call_antmaze_env import call_antmaze_env
from envs.sawyer.call_sawyer_env import call_sawyer_env
All these functions accepts a dictionary as input where env_name and shift_level should be specified. For example
Then one can call ant-friction-0.5 environment with
Similarly, if one wants to run experiments on pen-broken-joint-easy environment, then one can use the following codes:
env_config = {
'env_name': 'pen-broken-joint',
'shift_level': 'easy',
}
env = call_adroit_env(env_config)
Running Implemented Algorithms¶
We run all four experimental settings with the train.py file, with mode 0 denotes the Online-Online setting, mode 1 denotes the Offline-Online seting, mode 2 specifies the Online-Offline setting, and mode 3 means the Offline-Offline setting. One can switch different setting by specifying the --mode flag. The default value is 0, i.e., Online-Online setting. We give an example of how to use our benchmark below:
# online-online
CUDA_VISIBLE_DEVICES=0 python train.py --policy DARC --env hopper-kinematic-legjnt --shift_level easy --seed 1 --mode 0 --dir runs
# offline-online
CUDA_VISIBLE_DEVICES=0 python train.py --policy CQL_SAC --env ant-friction --shift_level 0.5 --srctype medium-replay --seed 1 --mode 1 --dir runs
# online-offline
CUDA_VISIBLE_DEVICES=0 python train.py --policy PAR_BC --env ant-morph-alllegs --shift_level hard --tartype expert --seed 1 --mode 2 --dir runs
# offline-offline
CUDA_VISIBLE_DEVICES=0 python train.py --policy BOSA --env walker2d-kinematic-footjnt --shift_level medium --srctype medium --tartype medium --seed 1 --mode 3 --dir runs
--envspecifies the name of the target domain, and the source domain will be automatically prepared--shift_levelspecifies the shift level for the task--srctypespecifies the dataset quality of the source domain dataset--tartypespecifies the dataset quality of the target domain dataset--paramsspecifies the hyperparameter for the underlying algorithm if one wants to change the default hyperparameters, e.g.,--params '{"actor_lr": 0.003}'
We directly adopt offline source domain datasets from the popular D4RL library. Please note that different dynamics shift tasks have varied shift levels. We summarize the shift levels for different tasks below.
| Task | Supported Shift Levels |
|---|---|
| Locomotion friction/gravity | 0.1, 0.5, 2.0, 5.0 |
| Locomotion kinematic/morphology | easy, medium, hard |
| Antmaze small maze | centerblock, empty, lshape, zshape, reverseu, reversel |
| Antmaze medium/large maze | 1, 2, 3, 4, 5, 6 |
| Dexterous Manipulation | easy, medium, hard |