Codeball environment for PufferLib

30th Jan 2025
Tags:
3d,
wip

Codeball

In the 2010s, the Russian mail provider mail.ru ran a series of competitions on coding game AI agents called Russian AI Cup. I participated in 2017 and 2018. The second competition was about programming a soccer team of robots:

Most of the participants used a model predictive control approach, where they would simulate the game a few steps ahead and choose the best action. The simulations were easy to write because the competition organizers provided very detailed Rust-like pseudocode with comments. I translated it to Kotlin and Cython, and it was very easy to make performant.

PufferLib

PufferLib is Dr. Joseph Suarez's library for reinforcement learning research. What distinguishes it from other libraries is the emphasis on performance, especially that of the environments. The CPU code is not a bottleneck and the environments typically run at millions of steps per second. Environments are encouraged to be written in pure C with raylib for visualization.

Codeball + PufferLib

Translating the pseudocode to PufferLib was trivial - I copy-pasted it to Gemini Flash 1.5 and it worked out of the box. The only thing I had to add was the 3D visualization code. I used raylib's 3D OpenGL wrapper to render the field and the robots. I added a basic shader to make the environment look nicer, including it in the source code as a two header files generated by xxd -i. Here is what it looks like with a scripted policy, with the agents running towards the ball:

Writing the Cython bindings was also simple. I heavily referenced the existing environments. I just needed to optimize my initial solution by storing everything in a single buffer and avoiding unnecessary copies.

The reward function is configurable with a config file, just like for other PufferLib environments. The user can make it take into account the position of the ball, whether a goal was scored, and the position of the player. However, I still haven't figured out a combination of hyperparameters that would make the agent learn to play well, or even reliably run towards the ball. I will update this post when I do.