Dwyrin is testing the AI / Suggestion for improvement

frolag · May 17, 2019, 10:05pm

Please don’t post misinformation. In the OGS feature, negative delta always means a mistake (whether white or black). This is easy to confirm by subtracting the winrates before and after.

In the first Dwyrin game analyzed by OGS, moves 64 and 138 (both even, so same player) show deltas of +16.9pp and -36.6pp.

meili_yinhua · May 18, 2019, 12:10am

Positive delta generally just means that the bot misevaluated the previous situation (in terms of the usual Alpha-beta tree search that most people think about go with).

So really it’s just that there is some move this search uncovered that was not searched in previous iterations severely changing the overall results.

Vsotvep · May 18, 2019, 12:50am

Yes, and the high percentage means that it didn’t really judge the previous position well, which is because the score for the previous position is not investigated anymore after the first hunch (right?)

Vsotvep · May 18, 2019, 1:18am

Here’s an efficient way to improve the hunches: use Monte Carlo Tree Search

Let’s call it the Hunch game, and call the game that is being reviewed the Go game.

Possible moves in the Hunch game are selecting a move in the Go game. Winning the Hunch game means selecting the largest game changing move in the Go game.
After selecting a move in the Hunch game, the simulation phase starts, which is LZ doing another playout in the Go game for the selected move of the Hunch game, and the move before the move of the Hunch game (so this requires parallel LZ evaluations for each move in the Go game that stay inactive until another playout is requested).
A simulation is considered a win if it has a winrate delta (between the two winrates that LZ just computed) that is amongst the highest (actually lowest, since it’s negative) 3 of the whole Hunch game.
Choosing which move of the Hunch game to explore more is done by the same heuristic as in normal MCTS.

This will converge towards the three most terrible moves of the Go game in a way more efficient way than currently. What is being used now is giving a uniform search over the whole Go game, and then spending more time reading deeply without improving the Hunch game.

Using my suggested MCTS approach will both be a better heuristic for finding the optimal move in the Hunch game, as well as compute LZ’s deeper reading of those optimal moves.

meili_yinhua · May 18, 2019, 3:46am

of course, in the case of a decreasing delta, it can be either

a) a suboptimal move

or

b) the same case of a mistaken hunch

problem is, in these reviews you can only guess that based on if you got their highest rated move or not.

But I like the system of determining the delta of “winrate of optimal move” and “winrate of actual move”

Vsotvep · May 18, 2019, 3:48am

Hence my suggestion above for an improved way of finding the decreasing delta such that it is a suboptimal move, and not a mistaken hunch.

meili_yinhua · May 18, 2019, 3:49am

Yeah I just added that I liked that solution in edit,

Oh well, now everyone can know it wasn’t in the original post

flovo · May 18, 2019, 4:38am

I don’t get it.

For what? And why MC?

Call what the hunch game if not the reviewed game?

Vsotvep · May 18, 2019, 4:46am

Not Monte Carlo, I meant specifically the search algorithm MCTS, but then a variant of it (like how LZ uses an MCTS variant to find the best move)

With the reviewed game you mean the game that is being played, so the game of go the two players played?
I, however, want to pose finding the most game changing move in terms of another game, as talking about games is a good way of talking about these kind of algorithms. For example the MCTS algorithm above is described in terms of game theory.

I suggest to find the most game changing move by using a MCTS inspired algorithm, that does not rely on Monte Carlo random play in the simulation phase, but instead uses playouts by LZ to determine the winrate change.

I could describe it more step-by-step, if that is helpful.

flovo · May 18, 2019, 5:08am

I think my main problem at the moment is to understand what this other game is. I’m pretty sure it is not fully independent of the game to review, but the way you describe it sounds like it is widely independent to me.

Vsotvep · May 18, 2019, 5:22am

Ok, let’s try to put it in other words, and describe it as an algorithm:

The goal is to find the most game changing move of a finished Go game.
First, we compute LZ’s estimate of the board at each position of the Go game, but we let LZ do only a single playout (i.e. it is a very bad guess)
Next we compute for each position X of the Go game (except the first position) what the relative winrate is, by computing the difference between LZ’s estimate of winrate for move X and LZ’s estimate of the winrate for move X-1 (making sure to look at the winrate of the player who played move X in both cases)
Now we repeat the following steps until the result is satisfactory:
– Take the board position that requires more exploration (this is dependent on if the board position has not been visited by this algorithm often, or on if the board position has a high relative winrate; see the " Exploration and exploitation" section on the MCTS wikipedia page)
– Let LZ make another playout in her estimates of the chosen board position, and the one before the chosen board position, and compute the new relative winrate
– update the board position in that it now has another visit, and add the relative winrate to its total score (a high score means a high relative winrate)
After several of these rounds pick the position that has been visited most.

It’s basically MCTS, where the simulation phase starts immediately, and has been replaced by LZ doing an extra playout.

Lys · May 25, 2019, 5:06pm

Talking about Dwyrin ‘s games, it would be nice if some mod would run the full analysis on those games.

AdamR · May 25, 2019, 5:34pm

Wouldn’t it be kind of rude though? Unless he mentioned wanting it in the video or something.

dekiru · May 25, 2019, 6:21pm

Would be rather interesting to see the difference at least.

Lys · May 25, 2019, 8:29pm

He just didn’t know how the feature worked and played two games to give it a try. He looked at the top 3 moves and then tried to run the full analysis but he isn’t supporter, so he couldn’t.

What could be rude though?
Games are public.

AdamR · May 25, 2019, 8:37pm

I don’t know, maybe I was overthinking it. Just don’t wanna force AI reviews on someone who did not ask for them.

Lately I feel that AI are taken sort of as a gospel even though I think we can’t really play the same way as them and thus a good move for an AI is not always good for a human.
Therefore I was a bit worried as not to be taken as “poking” at his teachings if the AI disagrees with some moves.

That said, if he tried for the review himself but was unable (I did not watch those vids yet), I am happy to run it (and btw he is a supporter, just not on his teaching accounts)

So, here they are I guess

Eugene · May 26, 2019, 9:20am

As a counter-argument, Dwyrin reported the experience of the average non-supporter user.

That is what it is: if it needs to be fixed, then it probably should be fixed.

BHydden · May 26, 2019, 10:14am

I guess that depends on how much one has the right to expect from a free service.

Vsotvep · May 26, 2019, 10:37am

Well, if I’m honest, in my opinion it feels like in its current state it is more useful to turn off the free three moves than to turn them on.

It’s basically like a lottery that selects three quite arbitrary moves in the game, and then gives a complicated read out deeper than can really be useful, and leaving in several terrible suggestions.

Also, remember that this will negatively influence the expectation the average non-supporting user will have about the full review , thereby reducing the number of people who might have considered supporting to get some good review of their game.

Eugene · May 26, 2019, 12:39pm

Actually, I don’t think that expectations of the receiver have anything to do with it.

The real question is what the service itself is trying to achieve, and whether the features we’re implementing achieve that.

I think we are trying to achieve being a great, well featured place to play go, for all comers.

If that is true, then a feature that is broken should be fixed, irrespective of whether it is paid or not, and feedback from non-paying users might be best being addressed directly, rather than giving them selected samples of the paid experience.

(That being said, it’d be awesome to know if Dwyrin thinks that the full review fixes the problems he reported )

GaJ