Dwyrin is testing the AI / Suggestion for improvement

Vsotvep · May 18, 2019, 5:22am

Ok, let’s try to put it in other words, and describe it as an algorithm:

The goal is to find the most game changing move of a finished Go game.
First, we compute LZ’s estimate of the board at each position of the Go game, but we let LZ do only a single playout (i.e. it is a very bad guess)
Next we compute for each position X of the Go game (except the first position) what the relative winrate is, by computing the difference between LZ’s estimate of winrate for move X and LZ’s estimate of the winrate for move X-1 (making sure to look at the winrate of the player who played move X in both cases)
Now we repeat the following steps until the result is satisfactory:
– Take the board position that requires more exploration (this is dependent on if the board position has not been visited by this algorithm often, or on if the board position has a high relative winrate; see the " Exploration and exploitation" section on the MCTS wikipedia page)
– Let LZ make another playout in her estimates of the chosen board position, and the one before the chosen board position, and compute the new relative winrate
– update the board position in that it now has another visit, and add the relative winrate to its total score (a high score means a high relative winrate)
After several of these rounds pick the position that has been visited most.

It’s basically MCTS, where the simulation phase starts immediately, and has been replaced by LZ doing an extra playout.