PongNoFrameskip-v4 Experiments
Description
PongNoFrameskip-v4 expeirments. Evaluate on separate environments every 250k timesteps in parallel (see code for details), run for 5M timesteps (roughly 23.15 hrs of experience).
Executive Summary
Experiment | AverageReturnPerkWh | AverageReturn | AsymptoticReturn | total_power | exp_len_hours | cpu_hours | gpu_hours | estimated_carbon_impact_kg | |
---|---|---|---|---|---|---|---|---|---|
4 | A2C+Vtrace (cule, default settings) | 214.192 +/- 14.36 | 11.333 +/- 0.72 | 19.544 +/- 0.32 | 0.053 +/- 0.00 | 0.333 +/- 0.01 | 1.435 +/- 0.35 | 0.145 +/- 0.03 | 0.011 +/- 0.00 |
1 | PPO2 (stable_baselines, default settings) | 50.397 +/- 5.91 | 11.057 +/- 1.02 | 19.968 +/- 0.40 | 0.222 +/- 0.01 | 1.587 +/- 0.04 | 6.252 +/- 0.14 | 0.220 +/- 0.00 | 0.075 +/- 0.00 |
3 | DQN (stable_baselines, default settings) | 18.234 +/- 1.15 | 17.555 +/- 0.23 | 20.600 +/- 0.15 | 0.978 +/- 0.06 | 9.206 +/- 1.10 | 18.584 +/- 0.10 | 1.732 +/- 0.02 | 0.299 +/- 0.03 |
2 | A2C (stable_baselines, default settings) | -103.178 +/- 1.60 | -17.045 +/- 0.21 | -6.080 +/- 1.13 | 0.165 +/- 0.00 | 1.265 +/- 0.02 | 6.185 +/- 0.04 | 0.110 +/- 0.00 | 0.058 +/- 0.00 |
Graphs