Some messy thoughts, not carefully organized, or thought about where to say what.
Having disappeared for a year, Fine Art was our most anticipated opponent. The final score of 4:1 is also in line with previous expectations, but there is still a little regret.
On the first night of the final, we carefully analyzed the three games of the day, especially the one that was lost, and found some potential problems. After a series of urgent corrections and adjustments, we created a new version, and a full night’s time was used to test it against the previous version. The result was 32:6. This is why Golaxy seemed to be stronger in the final round. In the fourth game as White, the minimum win rate of Golaxy was 55%. The decisive blow came from Black’s mistake at move 169. This move made the White win rate rise to 82%, which directly ended the game. In the fifth game as Black, Golaxy failed to break Fine Art’s White fortress. According to Dr. Ma, the winning percentage of this game was the slowest rise in all games.
Some people have speculated what the configuration of Fine Art was. This idea did not bother the team enough to clarify. But they still gave some information: Fine Art’s lead programmer responded to an audience question, saying that there was no change in the configuration.
There is still a gap between Golaxy and Fine Art, regardless of software or hardware. But this gap has been greatly reduced compared to the Tencent AI contest last year. If we couldn’t even touch the edge of Fine Art at that time, now we can clearly see how big the gap is. If we found those problems in the early days and corrected them to this version, I think that Golaxy still could not beat Fine Art as White, but Fine Art as Black most probably would not be able to exceed the initial advantage of Golaxy as White. The overall score would be about about 6:4 in Fine Art’s favor. In just 5 games, the randomness of who gets what color might determine the final result.
Why did we not discover Golaxy’s problems before? Because in addition to hardware and software, the gap between Golaxy and Fine Art is reflected in a more important aspect: experience.
AWS provided a computing allowance of 10,000 yuan for all participating teams, which can be used for an 8 V100 server for more than 200 hours - enough to handle the whole process. This is also the level of configuration of most of the teams. If you had a distributed engine, you could use more at the same time. Not knowing Fine Art’s entry information and that of AQZ, Golaxy had expanded its engine capabilities to handle more than 100,000 playouts per second; we used unprecedented computing power, although we don’t know how much computing power Fine Art had, but last year we couldn’t do one tenth of what we did this year. (In addition, other teams could also use ten times as much power.)
These resources come from AWS and were in place the day before the game. We lack experience in using this resource environment. Throughout the whole event, we were making slight adjustments - every night, we constantly changed the code. We changed versions 11 times in the 5-day game time. Although we had done simulation tests beforehand, many problems can only be exposed in the actual environment. In low playout (1,000 to 2,000 per second), medium playout (10,000 per second) and high playout (100,000+ per second) games, the operating parameters of the mcts engine, the shape of the search tree, and the adjustment of the neural network output are quite different.
Fine Art’s team had started working at least three years ago, and Jin Mao has been with them almost two years. Compared with them, Golaxy is almost just getting started. This high-playout game is costly to test, and the stronger the opponent’s test, the easier it is to expose potential problems and indicate the direction of improvement. In the past two years, Fine Art has been Golaxy’s most frequent opponent, which has helped a lot. Until now, the 11 games of against Fine Art since the end of last year have been taken out from time to time to test whether the blind spots at that time can be found, whether there is a better selection point, whether the situation assessment is improved, and so on. From this perspective, I am really grateful for Fine Art.
Why didn’t you do this work before then? The reason is simple: no machine.
On August 2nd, our computer room finally got the long-awaited hardware update. I have been waiting for more than a year. Using the popular V100 as the unit of measurement, this batch of machines is equivalent to about 40 V100. In the past three weeks, we ran a group of games, trained a large neural network, and pulled out the network without further testing. There is really no extra power to test and adjust the high playout engine.
What have you been doing for so long before? From the establishment of Golaxy to most of the time before August, the team’s total computing resources was equivalent to about 12 V100. Yes, you are not mistaken, including development, testing, training, and all the calculations of the spectrum, which is about 12 V100.
We are often bursting with all kinds of whimsy at the seminar, but we can only test one idea at a time, then wait a week or even weeks to see the effect, and then try the next one. Many of the ideas were put on hold and then forgotten; we were forced to pay great attention to hardware performance. The graphics of the graphics cards were all parallel, running hundreds of copies at the same time to make full use of the big batchsize; In less than a year, our GPUs never stopped running except for a minute of power outages.
Stretched, ah!
Fortunately, the hard days have finally passed. We heard that in a few months, we will obtain a new batch of computing power. After 4 months, the magnitude of Golaxy’s resources will definitely be greater than the version of Golaxy relative to the April Fuzhou competition.
I’m starting to feel emotional over this AI.
The most unexpected result was AQZ, which hinted it might win the championship but did not even enter the final four. Several opponents overcame it the preliminaries with cautious poise. Their biggest mistake may be to build a program behind closed doors, buying more than 4,000 GPUs, then rushing into the market—not coming out in a year, still taking the standard of last year’s competition as the standard—it is true that at the current level of AQZ, it would be able to win the championship last year. I saw the pre-match interview saying that they still use the AGZ method to train, just using multi-agent, a half-hearted effort we think. Compared to the model described in the AGZ paper, the current level of Golaxy should have exceeded its ceiling. [NOTE: He thinks both Golaxy and Fine Art exceeded AlaphGo’s level] As for the multi-agent, after AlphaStar announced this method, it was quite a few Go AI followed that lead; we think it does more harm than good. After reading the first few games of the preliminaries, I was completely relieved that this version of AQZ could not beat Golaxy.
LeelaZero in fourth place was slightly unexpected. From the preliminaries to the semi-final match, LZ should be stronger than Handol. After the three or four finals, Go fans had guessed that having Black is the main reason. By the way, LZ has a problem with the judgment of the dagger joseki [NOTE: dagger or sickle, referring to this Josekipedia ] and its follow ups. In the third round of the semi-final, LZ once had a self-evaluation rate of 6x%. At that time, Golaxy self-evaluated 8x%, the two sides added to 140%, and the error bubble was actually 40%. Big. In contrast, the sum of the self-evaluation ratios of Golaxy and Fine Art is almost never more than 105%.
HanDol performed some wonderful escape tricks. It was indeed convincing that the first appearance of the game was an original work; and Mr. Lin, the author of HanDol, had a good time as always. Xiaotian plays high. He said that the style was modeled from the style of Nie Lao… And he wasn’t concerned about comparisons to other teams.