Human MCTS

DVbS78rkR7NVe · April 17, 2019, 5:46pm

So MCTS is kind of a way to evaluate position and moves by playing out the game many times to the very end.

What If human would do the same? Let’s restrict ourselves to 9x9 board for simplicity. Would following MCTS-like approach and playing out maybe 100 simulations result in stronger moves? Of course, that would have to be a rather lengthy 9x9 correspondence with analysis enabled.

SanDiego · April 17, 2019, 6:11pm

Question: stronger than what?

I doubt that 100 simulations is considered proper MCTS, especially if you don’t combine it with any cognitive patterns (what go engines did before AlphaGo). I would expect that a much larger sample is needed.

Other than that, MCTS is a method and doesn’t care about the underlying “tool”, computer or human.

DVbS78rkR7NVe · April 17, 2019, 6:39pm

But that’s point. You’re gonna use your brain to play out game that makes sense rather than it being more or less random.

flovo · April 17, 2019, 7:53pm

I think this is called reading for human go players.

So we do it already.

To do MCTS you have to do much more then 10000 playouts. This will take a while.

yebellz · April 17, 2019, 9:17pm

In MCTS, I think the idea behind randomly playing games out to the end is that it is hard to write an algorithm (manually via various heuristics) to perform positional judgement for arbitrary positions in ongoing games. The “average outcome” of a lot of “reasonable” random play outs from a position is instead used as an estimate of the value of that position. However, it takes a lot of (much more than 100) random play outs and they have to be generated with good heuristics to make them “reasonable”. Using very few play outs will probably perform quite poorly, and a human manually trying to follow MCTS would likely perform much better by instead just using their thinking time for reading in the traditional way.

Humans and newer AI approaches (based on neural networks) also try to read forward in order to judge the value of a position. However, both will typically cut the reading short of reaching the end of a game. Humans can use intuition and judgement to evaluate intermediate positions, and the newer AI approaches train a neural network to estimate the value of arbitrary positions (and these approaches also train another network that helps generate reasonable moves for reading forward).

@SanDiego makes a good point by asking “stonger than what?”

Your initial question is not so clearly defined, and your follow-up just makes the question more confusing by indicating that you are considering a human that only does something like MCTS, but also leverages their own intuition.

Ultimately, I don’t think the MCTS algorithm should have any impact on how humans manually perform reading while playing the game. MCTS is an algorithm for computers. Humans leverage different heuristics (particularly for judging incomplete games), which have been tuned for over thousands of years to be suitable for the way we think.

We would waste a lot of time playing out variations to the end, when our judgement might already provide a good enough estimate of intermediate positions.

DVbS78rkR7NVe · April 17, 2019, 9:49pm

Hear me out. This idea comes directly from playing 9x9 with analysis enabled.

And at least in kyu range for sure people are rather bad at judging intermediate positions. Especially when it’s all about only couple of points difference. So I found that actually playing out the game to the end in analysis mode and counting result gives much better evaluation than intermediate “looks ok” and “such a close game” evals. So I thought if I force myself to do some sort of human MCTS, even if it takes time. Wouldn’t my play be much stronger?

Imagine I want to beat Sadaharu in a ladder by any means possible (without outside help, of course).

Edit: also, part of the idea is that since when you play out the game you aren’t playing random moves, you try to be close to perfect play, you don’t need thousands of playouts.

yebellz · April 18, 2019, 1:57am

Well, yes, if you spend more time reading and exploring more variations carefully, then you will probably find stronger moves and have a better understanding of the position.

I would not call what you seem to be thinking about “human MCTS”. That seems to be just what most people would call “reading” or “analyzing variations”.

SanDiego · April 18, 2019, 2:14am

What you are describing makes sense, it’s called tree analysis but has little to do with MCTS.

Monte Carlo simulations are named after the gambling hot spot in Monaco , since chance and random outcomes are central to the modeling technique.

Eugene · April 18, 2019, 2:56am

I’m now not clear whether the OP was talking about doing actual MCTS humanly or not.

At one point:

… this is not MCTS. The whole thing about MCTS is that it is random.

… but this might be. IF the human choses the branches to follow at random, then it would be different to our normal reading. And it might generate stronger moves because occasionally it will “force” us to read a variation that we would not have thought of which is actually better…

Mulsiphix1 · April 18, 2019, 6:28am

I think that S_Alexander is using the idea of MCTS as an example. Not saying "As a human I will follow the rules of an MCTS analyzation perfectly. However, I will only restrict my playout number to 100 games.

The question I have here is, how can a human choose a branch at random? If the human is able to know what different branches look like, psychologically their choice will not be random. They will try to make a random choice, but subconsciously there will be reasoning as to why they decided a particular branch was the best random choice. They would need to use an external method for choosing, like a 100 sided die or two D10, to pick randomly.

Part of my self imposed Go training requires that when I start a variation, 8 out of 10 times I need to follow it to the very end of the game. I only play 9x9 Correspondence games, I perform 2 to 6 variations per move, and I begin performing variations between moves 4 and 8. Without a shadow of a doubt, I feel that this does provide me with better moves as the game progresses. I am often on the lookout for how certain variations increase the final score by 1 point or more.

I often joke with myself that I play Go against other players very little. Instead choosing to treat Go like a solitaire puzzle game where I only truly face myself. The other opponent controls the moves of one color, but I am really playing out hundreds of variations against myself, with the aid of a removed third party. This other human being ensures that I look at the game constantly through new eyes and that I do not rob myself of sufficient challenge (that subconscious aspect I spoke of above).

This is my relationship with Go right now. My life has become exceedingly busy and I continue to consider taking a break from Go, since I spend so much time thinking about and playing it. I can tell you that what scares me is not losing the feeling of competition or not being able to research a fun and exciting hobby. What scares me the most is not being able to do my puzzles. Not being able to take time out of my life for this particular brand of relaxation, contemplation, and reflection. Not being able to watch myself grow on a regular basis.

Even when my ranking wobbles up and down, I regularly see myself overcoming obstacles in matches that previously had stopped me dead in my tracks. I’ve learned what to do and I combat the challenge effectively, repeatedly. This feels me with a great sense of accomplishment and self confidence. It makes me feel so good and I take that with me into other areas of my life. But most of all, Go has become like a friend. We hang out and we have fun and we push each other to be stronger and better. This is a unique experience for me and I’ve come to care for it very deeply .

DVbS78rkR7NVe · April 18, 2019, 7:03am

You can call it MCTS-inspired analysis, if that makes it better. But I thought it wasn’t necessary for it to be random. Point is to have this cycle in it.

In AlphaGo papers they also call their tree MCTS but they choose nodes based on maximizing Q+U. As I understand fast rollouts are just choosing the most probable move also?

So there. Got you. Doesn’t have to be random, at least not entirely.

Anyway.

First important part is that we’re forcing ourselves to play out the game to the end (unless it’s really that obvious).
Second of all I maybe would like to build a tree of variations with each node containing number of our simulations and how many times we won.

Why do I need this? If we can define this well, then we will have a process that’s hopefully guaranteed to force you to choose a better move than what would you choose by usual reading. And I can win Sadaharu, maybe.

And so the difference to usual reading is that we have a process to it. In usual reading you often explore same variations several times, cut yourself short when you shouldn’t have.

Mulsiphix1 · April 18, 2019, 9:45am

My question quickly becomes, how can this process become familiar enough or presented well enough, that others can use it as a study aid as well? It seems this would need to be done digitally, as pouring over a 100 sheets of paper based variations would be… tedious and… difficult .

If you can come up with a system and get somebody to build a software tool to handle variation input, to handle the MCTS style tree’s and any other data that needs to be generated to make sense of it all, then you might have something here. Without that though, you are proposing some serious paperwork. The ends must justify the means .

_KoBa · April 18, 2019, 10:36am

Im sure 100% sure that any human, who is able to read 100 full games worth of variations and then choose the best one, would be incredibly strong. So much stronger than i as a ~2–4k, or the strongest purely monte carlo bots (which are somewhat close to my rank), that i really can’t even imagine how powerful would that person be.

Lys · April 19, 2019, 7:47am

I fear the main issue is to try out variations

that makes sense

I could try hundreds of variations but would surely miss some good moves that I can’t even figure out from my low skill level.
Reading isn’t just a matter of time but also quality of moves chosen.
If everybody here were to draw a tree of moves from the same starting position, I’m pretty sure that my tree would miss many good branches.

That’s where randomness (or even brute force) is involved, I guess: including moves that you wouldn’t include just following your knowledge of the game.

Mulsiphix1 · April 19, 2019, 9:18am

I agree completely. Having a tool like Leela (Zero) is extremely beneficial in self-led reviewing. I use her religiously to try and see move possibilities that my limited 16k knowledgebase couldn’t conceive. She often plays a move that boggles my mind, but after some deep analyzation it becomes obvious why she chose to play there.

I’ve been doing this about a month and I feel my game has improved as a direct result. More importantly, I notice that I am beginning to think about the potential of each move in a deeper fashion that ever before. So exciting !

Vsotvep · April 19, 2019, 12:41pm

How about trying it out? Who’s going to beat Sadaharu?

Deep_Scholar · April 20, 2019, 3:13am

@Vsotvep hehe good luck to people doing that I beat him once so far and it is a challenge

SanDiego · April 20, 2019, 6:50pm

Sure, let’s start.

So you have black and need to decide the first move. You play 100 games with yourself, takes maybe 15 to 20 hours, Then do the math and pick the one with the best score.

Do you feel those 100 games give you a useful indication on what your first move should be? Did you learn a lot from playing 100 fast games against yourself?. More importantly, how do you capture that result so you can reuse it?

Now your opponent takes two seconds to play move 2, and you’re up for another 20 hours figuring out move 3. By the time you reach move 10, you’ll have spent more than a hundred hours playing 500 fast games against yourself. And learning what?

Vsotvep · April 20, 2019, 7:09pm

Well no, that’s not how MCTS works. You don’t start from scratch with each move, with that kind of strategy I would be surprised if any MCTS program ever got above 20k. See the post by @S_Alexander above for how it works:

Also, worst case scenario you play a lot against yourself, but you do get try and find new moves that you haven’t tried before, since you have to create new branches that you wouldn’t consider under usual circumstances.

Finally, it was never mentioned that this method is supposed to be an efficient way to make you learn things, was it?

DVbS78rkR7NVe · April 20, 2019, 7:13pm

I don’t think it takes that much time to play a 9x9 game, especially for a good blitz player. And it’s playing go, shouldn’t you like playing go?

Learning go, you dummy.

Of course, I didn’t try 100 games, but I often study the game playing out variations to the very end before picking moves on 9x9. It does help me make a better move. And I do learn a little bit of stuff, for example, in playing around with different order of playing endgame moves.