Experiment Sets
PongNoFrameskip-v4 Experiments


Description
PongNoFrameskip-v4 expeirments. Evaluate on separate environments every 250k timesteps in parallel (see code for details), run for 5M timesteps (roughly 23.15 hrs of experience).
Executive Summary
Experiment AverageReturnPerkWh AverageReturn AsymptoticReturn total_power exp_len_hours cpu_hours gpu_hours estimated_carbon_impact_kg
4 A2C+Vtrace (cule, default settings) 214.192 +/- 14.36 11.333 +/- 0.72 19.544 +/- 0.32 0.053 +/- 0.00 0.333 +/- 0.01 1.435 +/- 0.35 0.145 +/- 0.03 0.011 +/- 0.00
1 PPO2 (stable_baselines, default settings) 50.397 +/- 5.91 11.057 +/- 1.02 19.968 +/- 0.40 0.222 +/- 0.01 1.587 +/- 0.04 6.252 +/- 0.14 0.220 +/- 0.00 0.075 +/- 0.00
3 DQN (stable_baselines, default settings) 18.234 +/- 1.15 17.555 +/- 0.23 20.600 +/- 0.15 0.978 +/- 0.06 9.206 +/- 1.10 18.584 +/- 0.10 1.732 +/- 0.02 0.299 +/- 0.03
2 A2C (stable_baselines, default settings) -103.178 +/- 1.60 -17.045 +/- 0.21 -6.080 +/- 1.13 0.165 +/- 0.00 1.265 +/- 0.02 6.185 +/- 0.04 0.110 +/- 0.00 0.058 +/- 0.00
Graphs