Rebuilding the adversarial KataGo testbench, Part 2
In Part 1, I got a simple evaluation harness working and used a few smoke tests to confirm that the old failure mode was still reproducible. In one 10-game batch, the adversary won 7 games outright, while the other 3 ran past the move cap and were recorded as no result.
That baseline was strong enough to show that the adversary really did have a clear edge over the victim it was trained against. So the next obvious question was whether it would still do anything against a newer, much stronger KataGo checkpoint: kata1-b28c512nbt-s12192929536-d5655876072 (14097.6 Elo), versus kata1-b40c256-s11840935168-d2898845681 (baseline victim, 13410.3 Elo).
From the original paper 1, they evaluated several victim visit settings: 1, 4096, 10^6, and 10^7; while capping the adversary at 600 visits. By my rough estimate, my poor RTX 4060 Ti would not be able to finish even 10 games in a reasonable amount of time once the victim got anywhere near 1000 visits. So I went with a smaller scale sweep instead: victim visits at 1, 10, and 100, while keeping the adversary capped at 600. My bold prediction was that the newer model would just win outright anyway, so I felt fine letting the adversary keep its usual 600 visit edge.
| Victim visits | Games | Victim wins | Victim win rate |
|---|---|---|---|
| 1 | 20 | 20 | 100% |
| 10 | 20 | 20 | 100% |
| 100 | 20 | 20 | 100% |
The adversary could not even steal a single game at the lowest tier, where the victim only got one policy shot. On one hand, I was disappointed by how lackluster the adversary looked here. On the other hand, I am also thankful that KataGo is still being actively improved. Even after already surpassing the strongest human players by a huge margin, the project is still getting better.
References
- Wang et al., Adversarial Policies Beat Superhuman Go AIs, 2023.