Rebuilding the adversarial KataGo testbench, Part 2

In Part 1, I got a simple evaluation harness working and used a few smoke tests to confirm that the old failure mode was still reproducible. In one 10-game batch, the adversary won 7 games outright, while the other 3 ran past the move cap and were recorded as no result.

That baseline was strong enough to show that the adversary really did have a clear edge over the victim it was trained against. So the next obvious question was whether it would still do anything against a newer, much stronger KataGo checkpoint: kata1-b28c512nbt-s12192929536-d5655876072 (14097.6 Elo), versus kata1-b40c256-s11840935168-d2898845681 (baseline victim, 13410.3 Elo).

From the original paper ¹, they evaluated several victim visit settings: 1, 4096, 10^6, and 10^7; while capping the adversary at 600 visits. By my rough estimate, my poor RTX 4060 Ti would not be able to finish even 10 games in a reasonable amount of time once the victim got anywhere near 1000 visits. So I went with a smaller scale sweep instead: victim visits at 1, 10, and 100, while keeping the adversary capped at 600. My bold prediction was that the newer model would just win outright anyway, so I felt fine letting the adversary keep its usual 600 visit edge.

Victim visits	Games	Victim wins	Victim win rate
1	20	20	100%
10	20	20	100%
100	20	20	100%

Victim at 1 visit. Adversary attempts atari on a large group

The adversary could not even steal a single game at the lowest tier, where the victim only got one policy shot. On one hand, I was disappointed by how lackluster the adversary looked here. On the other hand, I am also thankful that KataGo is still being actively improved. Even after already surpassing the strongest human players by a huge margin, the project is still getting better.

Victim at 10 visits. Victim did not even give in to ko

Victim at 100 visits. Adversary kept suiciding 3 stones in center

References

Wang et al., Adversarial Policies Beat Superhuman Go AIs, 2023.