Alpha Go Zero team; a challenge to finish what you started

liminal · April 15, 2018, 12:04pm

What if we taught Alpha Go Zero to play the most entertaining game?

Programmers loves optimizations. They will tinker for hours to increase the speed of analyzing a game and work on a sub-routine for moves in a game board position that predict 50 moves ahead.

This make a for a very lofty goal, which is creating a program that can play the game of Go against the best human players that have existed. It’s mind blowing how that kind of thing must feel for the programmer. What an achievement! To have created something no one, particularly experts, thought would be possible for generations. Alpha Go Zero did it without any help from previous human games.

Now that it’s been accomplished, and seriously it’s amazing, the team has rightfully moved on to look at more “serious” matters like finding tumors that doctors might miss and making the roads safer with automated driving. You want the next challenge, I understand that. But I think the work with the board game Go is missing something. An opportunity to really test these brilliant minds and to shape the coming age of Artificial Intelligence.

Make a version of Alpha Go that doesn’t play the best move, but rather the most entertaining move.

We’re humans. In fact, I’ll go further and say we’re only humans. And if you sit me in front of Alpha Go Zero to play a game I’m going to lose. I’m only 20k, so there’s no doubt about the outcome. In fact, I’m going to know that I’m going to lose and I don’t think I would enjoy playing after a few games. The novelty of playing a computer is lost pretty fast when you just aren’t going to win.

But what if Alpha Go played differently?

What if it played the most entertaining game of Go I’d ever played? Thrilling because it was right in that sweet spot of challenging and the outcome was unknown. Where my really bad moves lost me a few points instead, of the game, and my good moves were rewarded by reasonable and sound responses, but not nearly perfectly optimized winning moves?

What if Alpha Go played like a teacher who loved its student and just wanted to me to have some fun and grow as a player? Optimized not to win the game, but to win my heart? To actively and engagingly create a human experience.

Such a creation, in my opinion, would be far superior to what has been accomplished so far: and much harder.

Humans are the goal. Alpha Go has shown that it can beat the best players, but the best players are more fascinated by what they are learning about the game of Go from the program. That level of fascination could happen for all skill levels, for all players. That’s the promise of AI that we deserve, as humans. While tasks we performed in all walks of life are increasingly replaced by automation, we have to keep track of a goal. AIs are not the end, but a means for human enjoyment and happiness in life.

Finish Alpha Go Zero. Make it the most fun computer program to play for any human player. Then you will have done something useful for all humans. Then I can see a path where computers aren’t something people have to be afraid of, but look forward to. Yes, it’s incredible to optimized and create the most advanced machine learning subroutine that makes the very best moves, but to make a move that is the best move for the human opponent, at any level, is truly a challenge beyond what has been accomplished so far.

If you could do that, then we would all be moving into a brighter future.

[Edit: This is a cross post from another place, another account, and as such there is much debate. Be civil and thoughtful.]

Kosh · April 15, 2018, 12:33pm

I would prefer ‘most educational move’ to ‘most entertaining move’. So that it plays in a way that teaches what I most need to learn after it’s analysed a collection of my games. Also with verbal/text advice on the nature of my weaknesses compared to those of similar rank. eg: ‘You cede the initiative too often’ or ‘you always use the same pincer regardless of circumstances.’

liminal · April 15, 2018, 12:36pm

Exactly.

An program designed to benefit humans, to entertain and teach. The goal that has been passed now is that of winning against humans. The new goal marker, IMHO, now must be to help humans.

Either by teaching, entertaining or both simultaneously.

square.defender · April 15, 2018, 12:44pm

Leela Zero raw weight (without playouts) is very strong and plays fast.
It could be used for very fun to play bot:

generate move with 100% random coordinates
If Leela wins against itself after that move, play it against human
If Leela loses against itself after that move, play normal move against human

So if human plays bad, such bot will play a lot of random moves, but inevitably will win
If human plays good, it will not play random moves, human will win only if he stronger than raw weight (3 dan+)

Instead of random generator any algorithm for move choice can be used. (play most moves in center, play most moves on left side, …)

meili_yinhua · April 15, 2018, 6:51pm

The natural problem here is that this is incredibly hard to define.

What makes a move more entertaining or educational?
How do we tell that to the machine?
Isn’t level of entertainment or education completely subjective and depends on the player involved?
How then would we be able to tell the preferences of that player?

Sure, it would be nice to do, but also a nightmare for programming, unless we want to rate every move in post-game analysis…

Bitt · April 15, 2018, 7:10pm

This is not exactly what the OP is asking for (as it wouldn’t necessarily be educational or entertaining (or maybe it will)), but a fun idea none the less. Although, instead of playing a random move, it can play a move that gets the winrate back to 50% (or 51%). The the moves probably won’t look quite as odd.

This way, you will get an opponent at your exact same level. Will that help you get better though? Probably not…

smurph · April 16, 2018, 12:22am

But OGS could start populating the server with such fake dans, augment them with a chatbot feature and slowly but surely attract real human dan players to the server 8)

Masterplan.

Ancalagon · April 16, 2018, 2:27pm

I made this version of LZ. It basically samples ten moves or so, and plays the move that keeps the playrate closest to 45-55% winrate, unless it’s given no choice–in which case it selects the better move.

Check out the KGS bot “YourRank”. It’s played many, many games, and only lost one or two while constantly playing 50% winrate moves. It plays oddly, but not poorly. The most recent game was against a 2dan, who was about to kill LZ (and win the game) until he exclaimed “nooo” in chat and then realized “ok white was probably never in trouble in the first place”.

liminal · April 16, 2018, 2:39pm

Can you make it lose 1/3rd of the time? If you can do that, then you are very close to what I would want.

Bitt · April 16, 2018, 2:52pm

That is simply awesome I looked at the 2d and 3d games, didn’t see that many odd moves, but the 3d resigned very early in the games I looked at, not sure why.

Maybe you could ask those that play it to leave a comment on how it felt to play against it and why they resigned (if they do so).

Are you only looking at the moves made available to you and selecting the one closest to the range, or are you forcing it to look for “worse” moves if it is too much ahead?

Ancalagon · April 16, 2018, 3:56pm

I forced LZ to put a few hundred moves into a bunch of different moves, and forced it to search paths that had close to a 50% winrate. If a path ceased to have a winrate close to 50%, then it stopped getting playouts. Then I also let the vanilla, unmodified move search have a few hundred playouts also to spend however it normally prefers. At the end, I sort by winrate so that 45-55% winrate moves are preferred, then 55-65% winrate moves are ranked lower, then 65-75% winrate moves are ranked lower than that, etc. and then 35-45% winrate moves are ranked lower than 100% winrate moves, then 25-35% still lower, and 15-25% winrate lower, and 0% winrate moves lowest.

The key parts are 1) forcing it to spend playouts on MANY move candidates; 2) forcing it to spend more playouts on move candidates which maintain a 50% winrate; and 3) Ranking all the moves with at least a few hundred playouts so that moves closest to 50% are favored, with a bias towards winning moves.

Ancalagon · April 16, 2018, 3:57pm

That’s a very interesting idea. Let me thinking about it a bit more and see what I can come up with.

ALSO! Any other interesting ideas for wacky LZ-based bots like this?

smurph · April 16, 2018, 4:32pm

An LZ that tries to maximize winning margin, alias OverplayBot.

Ancalagon · April 16, 2018, 4:39pm

I think I can do this quite easily! And it will require minimal coding and only some light tuning if it works. This might be a nice addition for handicap games as well. I’ll give “OverplayBot” a try tomorrow and see what happens!

Anyone have other wacky/awesome ideas for fun LZ modifications? Tell me!

Ancalagon · April 16, 2018, 5:19pm

I’ll be running “YourRank” on KGS for the next 6 hours. Feel free to challenge it if you like!

liminal · April 16, 2018, 5:58pm

I can’t at the moment, although I would dearly love to try! Please let me know the results, however, because if you got it tweaked to win 1/3rd of it’s games, I think you would have create a wonderful resource for players.

Thanks for taking the time and interest in this idea.

Vsotvep · April 16, 2018, 6:08pm

Interesting, I had exactly the same idea a week ago. Any chance of making your modified LZ available for download?

meili_yinhua · April 16, 2018, 7:28pm

Could you get YourRank to play on OGS? (and probably on the cloud)

I’d love to try a game or two against it.

Bitt · April 16, 2018, 9:17pm

Just watched end of a game now (against alphagogo (which is not a bot)), but I think it just timed out and lost that way (it was ahead). Also, didn’t open up a new one?

liminal · April 16, 2018, 11:06pm

Link to the game?