BreakoutNoFrameskip-v4 Experiments
Description
BreakoutNoFrameskip-v4 expeirments. Evaluate on separate environments every 250k timesteps in parallel (see code for details), run for 5M timesteps (roughly 23.15 hrs of experience).
Executive Summary
Experiment | AverageReturnPerkWh | AverageReturn | AsymptoticReturn | total_power | exp_len_hours | cpu_hours | gpu_hours | estimated_carbon_impact_kg | |
---|---|---|---|---|---|---|---|---|---|
1 | PPO2 (stable_baselines, default settings) | 496.773 +/- 23.83 | 105.210 +/- 4.54 | 239.472 +/- 20.04 | 0.212 +/- 0.00 | 1.634 +/- 0.01 | 6.263 +/- 0.10 | 0.211 +/- 0.00 | 0.071 +/- 0.00 |
4 | A2C+Vtrace (cule, default settings) | 172.988 +/- 16.48 | 6.288 +/- 0.53 | 15.440 +/- 1.15 | 0.036 +/- 0.00 | 0.232 +/- 0.00 | 0.612 +/- 0.00 | 0.176 +/- 0.00 | 0.013 +/- 0.00 |
2 | A2C (stable_baselines, default settings) | 90.269 +/- 7.23 | 14.648 +/- 1.17 | 56.816 +/- 4.72 | 0.162 +/- 0.00 | 1.200 +/- 0.01 | 6.458 +/- 0.02 | 0.105 +/- 0.00 | 0.057 +/- 0.00 |
3 | DQN (stable_baselines, default settings) | 7.024 +/- 0.77 | 15.264 +/- 1.30 | 30.648 +/- 3.46 | 2.195 +/- 0.06 | 19.233 +/- 0.36 | 61.496 +/- 0.71 | 1.870 +/- 0.05 | 0.727 +/- 0.02 |
Graphs