This is very nice ! thanks a lot anoek for working on this.
Looking at the board from GaJ’s post, I have a few questions.
LZ’s moves are indicated with both letters and background color/intensity; how is one supposed to read that ? (i.e. B looks better than A in the example, but then why not always use label A for the bot’s recommended move ?)
Would it be possible to show the bot’s proposed variation as a branch in the analyze mode panel ? (not sure how much it’d help, but maybe do it just for the few largest missed-opportunity moves in the game ?)
I think the discrepancy is an artifact of not enough playouts, it’s pretty low for the beta site. I think that given enough time, they will probably match up, but in a short time frame, I’m assuming leela started exploring another move and realized it might be better than the move it had explored a lot already. Note I am not a Leela Zero expert by any stretch
In the example game at move 42, I see “D13 41.5%”. How do I interpret this? Does this mean that LZ was surprised by the next move by GnuGo? If LZ is so much stronger than the players it reviews, then in theory it should only find blunders (negative changes) and never be lectured by the players (positive changes). A small positive change (0.5%) would be acceptable noise, but large positive changes are worrisome and could be sign of a bug somewhere.
That shows the change in the estimated percent chance of that player winning, after that move was made.
The blue is the quick estimate given by the neural network, the purple is the estimation after Leela Zero has done a number of playouts and visits. So the blue is basically Leela Zero’s hunch, and the purple is what it thinks after it’s thought about it for awhile I need to document that better…
My question was how to interpret a large positive value like 41.5%. In my view the full interpretation table is something like
-30% : blunder
-5% : mistake
0% : good move
5% : LZ admits it slightly misjudged the previous position
30% : LZ admits it severely misjudged the previous position
Do you agree with this table? If not, then how would you express the meaning of a large positive value like 41.5% in English?
Since I’m worried about a bug but you’re not, I investigated a bit more. When I analyze the example game on my computer with the same network and the same number of playouts (200), the winrate doesn’t change at all at move 42 (D13 0%). The largest positive change over the entire analysis is 8.8%, and this number drops to 6.7% with 2000 playouts.
This support my claim that seeing 41.5% in the preview and so many moves above 10% indicates that something is not right.
I think you’ve got to curve the table, because win rate is nonlinear. For example, if my win rate is 29%, your table says I can never blunder, even if I lose 60-points in one move. Conversely, if the game is even, where a 2-point mistake is worth 10%, your table might flag a 6% loss as a mistake even if it doesn’t actually lose any points. I feel a table like yours should try as best as possible to emulate score, where a 4-point loss is marked as questionable, and you judge mistakes and blunders from there.