Dwyrin is testing the AI / Suggestion for improvement

#1

It seems he agrees that there are some things still quite confusing about the new AI stuff, also some bug with the timeout timer and a problem with the scoring algorithm (AI-review review at 20:20 and 51:37, the other bugs just before the reviews)


I think the top three moves is seriously failing to show the top three moves. It might seem to people that it’s ridiculous, as it sometimes

  • Shows the biggest mistake is actually the best move on the board
  • Shows the biggest mistake was a move with a positive win percentage
  • Shows that the biggest move is at a position it barely has read out
  • Shows three moves that were played very quickly after each other (move 78, 83 and 84 or something)
  • Reads out ridiculously long sequences that are not likely to actually occur in a game

I believe that this will harm potential future subscriptions, as the real deal is of course the AI review option for supporters, and this will give them very low expectations for an otherwise good feature.

So I suggest making some changes:

First of all, the way it finds the three moves does not make sense to me. It makes a “hunch”, and then reads out the moves from that hunch, but this of course does not improve the accuracy of the hunch. Since its hunch fails quite miserably quite often, it doesn’t really matter that the three moves are then read out. A far better system would be if it put all the effort in the hunch and not read them out as much, instead of reading out a very bad guess.

Secondly I feel that the AI reading out 18 moves ahead but only show a single variation is not very helpful either. It has to be a very special case where there is a forced succession of 18 moves such that those moves are actually “the only move”. Far more likely is that only the first three or four moves give insight in the situation, and the moves after that are just rough speculation or wishful thinking by Leela. Naturally Leela doesn’t base her judgement only on this 18 moves, but on a lot more branches.

Thirdly, the experience can be confusing to the player. It is not very clear whose move is the wrong one. I need to make several steps in my thinking: first checking what the last move was, then concluding that it was the other player who is about to make the mistake, and finally search for the letter A (which isn’t always easily found, since sometimes it’s other letters that are bright green). I’m sure this could be done better, for example by putting the actual mistake on the board, put a red cross on top of it and making the main suggestion by Leela stand out more.

Fourthly, I believe it would also be good to impose some balance in how close together it allows mistakes to be. I can foresee that it will ping-pong through consecutive moves, just because both players ignored a very important move elsewhere on the board. Of course this is helpful to be pointed out once, but if it’s done three times, that kind of wastes the potential of the move suggestion. So perhaps it would be interesting to spread the moves out over the game, say finding one biggest mistake in the each 1/3 of the game?

Fifthly, sometimes there is only “one move” or only “two moves” that work. It shouldn’t show any of the barely explored sequences. I would suggest not to show any of the sequences that isn’t at least explored a little more than, say, 5~10% of its total number of playouts.

Sixthly, it would indeed be nice if the sequences are loaded in the gametree, so you can click through them. It also becomes possible to feature several braches instead of just the single one.

11 Likes
#2

Oh, so you already did it! :smiley:

#3

Did what?

#4

Post Dwyrin’s video in the forum. :slight_smile:

I agree that the way it’s done now, the top three moves can be counterproductive.

But I also think that some of the issues you pointed out are tipical of AI analysis and not just the OGS implementation.

2 Likes
#5

Thank you for posting this.

But the bright green one is the best move. Maybe it would be better to hide the lettered variations and focus only on the bright coloured one or two variations. The green variations are the moves LZ explored and would play after all.


One can detect if both players ignore an important move by looking only at the win rate changes after one colours moves. (How does blacks win rate change between blacks move and blacks next move). I’m not sure if one should call a move “game changing” if it changed nothing because neither player played it.


I think the “biggest mistakes” could be improved if after we make a hunch over the whole game, by looking how the win rate changed compared to some moves before instead of just one move before. To go over a game with only a view readouts is a fast way to find when game changing moves happened, but fails to find the exact move. The bad move could be a few moves before but wasn’t detected then due to the low playout count.
After we identified a region in with the game changed, we have to go over the whole region with more playouts / a stronger net to realy indentify the gamechanging move.

Another way to improve the result would be, if we discard positions, which after we read them out, turn out to be not gamechanging at all (Discard moves with positive pp or which changed pp much less than anticipated) and use the fourth, fifth, … bad move instead.

And maybe change the name of “Top 3 game changing moves” to “3 game changing moves” :wink:


Edit: After completely watching the video:

Yes, yes, yes. Don’t show a move, which is only there because LZ spend one playout to discard it after. Just stick to the bright coloured variations.
And cut the variation depth down. After some moves LZ starts playing elsewhere, which doesn’t has anything to do with the “misstake” we made.


Edit 2:
dwyrin made 2 videos about AI review with LZ in general some time ago. I think they still apply to this feature


2 Likes
#6

anoek’s away for a few days, so we’ll have to wait a bit for feedback and response.

I hope Dwyrin was clear that this is still in its early stages…

GaJ

1 Like
#7

Of course not.

3 Likes
#8

FWIW, the timeout thing is not a strictly bug, but it is confusing.

The counter dwyrin was looking at was the disconnect timer. This kicks in when your opponent disconnects, and is intended to stop you having to wait the full timeout in that case.

Of course, if they disconnect with less time left than the disconnect timer, then they get timeout out when the game timer expires - there can still be time left on the disconnect timer.

Really, the disconnect timer should start with the min of time to run or the short disconnect time (or not kick in at all) in these circumstances.

4 Likes
#9

Are you sure? Someone else told me that “A” is the best move and that the bright green means only the most investigated move, not necessarily the best one (which is “A”).
This can be confusing though.

He stated that’s a new feature and expressed his disappointment, though specifying clearly that a full review by the AI should be more useful than the “3 moves” feature.

However, it’s a new feature but it’s released. It isn’t some kind of “beta” thing.
It’s out and everyboody can see it. So if it doesn’t work well, it should be fixed… or people could think that’s broken with good reason.

#10

LZ plays the most explored move, so I would call it the best move.
The move labeled A is the move for which LZ reported the biggest chance to win, yes. But LZ don’t think it’s the best move to play. This is an effect of the limited number of playouts. And maybe because the win rates are almost equal.
If LZ has spend many playouts on one variation, she only explores it farther if it’s has the (almost) biggest chance to win, so …

#11

So this is even more confusing for me: we have examples where the green move doesn’t even have a letter.

Also, this was you quoting Anoek:

The color intensity indicates how much a position was explored, and the letters indicate the order in which the bot would choose to play.

2 Likes
#12

I agree improvements are needed and am sure will come. But also I feel like there is kind of a wrong expectation from the function. It is an AI that moreover has to be kept kind of bellow its max strength to be even feasible to run. It is NOT a substitute for a teacher (let alone to a Dan player). It can provide an idea where to look to lower ranked players, it will not magically show and explain to a SDK (let alone Dwyrin :stuck_out_tongue: ) his/her only weakness. Obviously.

Well, yes and no. We are not a company, we do not have beta testers, etc, etc. Please try to keep in mind that this was all put together by one guy. It ran on beta server for a while, but nobody really plays there so there is no live data, no feedback. In the sense of being a community it makes sense to me to release it even though it is not absolutely polished just yet. You can see how much valuable feedback it produced :slight_smile:

Otherwise, thank you for the feedback and polite presentation, I pretty much agree with most of the points. I only don’t think it is a good idea to limit how close together mistakes are allowed to be. I get what you are saying, but can also see two severe mistakes being played in a row and even having shown three times that you really should have played ONE move instead of all of those other small things can (maybe even very often :smiley: ) be a super valuable lesson I think…

8 Likes
#13

I agree like everyone here: nobody talked about teachers.

It could.
But right now it says “look at this big changing move” and you look at the board and see that the player did actually the move that is both green and “A”. It’s meaningless.

We are all expecting too see mistakes but we don’t (well sometimes we do, but very often the green + “A” overlaps the triangle).
Is this one the “wrong expectation“?

Isn’t it strange that someone could say: “hey, buddy, you did a mistake and that’s exactly the move I would’ve played and the best one in my opinion!”
Isn’t that nuts?

1 Like
#14

It has never been presented as analysis of mistakes, and is explicitly not coded as such.

It is the moves that had the greatest impact on the game.

So to correct you example
“Isn’t it strange that someone could say: ‘hey, buddy, you did a move that had a great impact and that’s exactly the move I would’ve played and the best one in my opinion!’
Isn’t that nuts?”

No, that is not nuts.
It is just the case that when looking from any one board position to the next, the biggest move made was a positive one, not a negative.

If you were losing, and you played an amazing move that put you well ahead…even if you later ruined it again that amazing move would be worth highlighting.

1 Like
#15

I think there was sufficient feedback in the AI announcement thread. The move +41.5pp in the example game was a big red flag. Before launch I’d expect either a fix or a successful rebuttal of the concern in the thread. Instead it was just… launched.

Before limiting the AI, one should make sure that it’s running correctly. The example game analysis was advertised as using 200 playouts, but its quality was certainly below that, so there is something to fix. If the AI is running correctly and OGS still can’t do a quality analysis for everyone, then how about performing a decent quality analysis of 1 game out of 20 at random (for non-subscribers)? That could be a better advertisement of the feature than a poor analysis of every game.

1 Like
#16

If there was this amazing move available on the board, what does ‘you were losing’ mean?

5 Likes
#17

There are no positive moves.
Win rate starts at 50%. Then you can only keep your win rate or lower it.

When you see positive delta means that white made a mistake. Negative delta means that black made a mistake.

The best changing moves are all blunders

1 Like
#18

I think what nobody is talking about is that the 3 moves are essentially at first just a leela hunch from quickly looking over the game at like 1 playout or something, and then those 3 moves are analysed a little deeper.

Think of it like leela saying these 3 moves are interesting and possibly where you went wrong, but after looking at them a little deeper actually this one was good.

Edit. Here is anoek saying I think pretty much the same thing

1 Like
#19

I have already found it enlightening. It may be better for kyu players than for high dans.

2 Likes
#20

Well, it’s called the “Top game changing moves”, so you’d expect it to show three moves that had a great impact on the game. I think that perhaps at 4 or 5 dan, Leela will stall against opponents with these few playouts, but I’m still easily beaten by LZ even if she only play 10 playouts or so, her “hunches” are usually pretty close to dan level

Well, it is coded as such: it searches through the game quickly to find the move that changed the balance most, and then computes the top three of those moves to see if they actually were game changers. There is no second step if she finds out she made the wrong choice though, so it pretends to be the game changing moves, but doesn’t catch error even after she found out herself it actually was a good move.

This is not what it does either, I’ve seen it mark moves that are +0.3% or something. Virtually neutral moves so to say.

And from the perspective of game theory, this is not something that is really possible: if you were losing, there is no amazing move that will put you well ahead, that is the definition of being losing. If there were such move, you’re just winning, but made a wrong judgement.

We don’t want to show Leela’s wrong judgement though, we want to gain insight in the actual position. Otherwise we could’ve just as well picked a 10k level bot to do the job.

I tend to disagree, Leela’s moves are quite high level, so watching the suggestions she makes are often only understandable if you’re a high level dan.