Help creating "Kata Master" AI

Pond_Turtle · December 25, 2020, 12:06pm

Dear fellow Go players. There is idea I had for a long time and been discussing it publicly also and now I think there is a right time to actually move on it. Any advise, help and feedback that any of you can offer would be appreciated.

Here is the idea: We all know that Zero Style AIs are all the rage now and everyone and their mother is trying to copy that style. But I still think there is a lot of use and space for AI with more human style. After all AG master wonderful style more closer to the human play and still dominated the world’s strongest players. Also considering that lot of AI moves come with asterisk saying “provided next 50 moves are played perfectly” I think that such AI might be better resource for amateur so use for analysis and learning even if it would be marginally weaker than zero style AI.

So here is my idea: I would very much like to start group effort to train “Kata Master” and replicate the AG Master in similar way that AG Zero was replicated ie. train AI first on collection of human games from pre-zero era and then do a lot of self play.

key parts of the project as I imagine them:

Recruit strong and well known player or several to act as garant of the legitimacy of the project
Set up Gofund me to raise money to buy sufficient computing time on some cloud platform and reward for AI developer willing to help (necessary since I do not have skillset myself) and figure out how to make the money in and out transparent so there is trust - current idea is that anything left over would be donated to EGF and AGA
Enlist help regarding putting together and curating learning set of games. I have pretty good ideas where to start but any help and assistance is appreciated (good thing about kata is that as I understand it we can go all the way back to Dosaku since dynamic komi is a thing)
Set up group discord to discuss and coordinate

I would really like to get this ACTUALLY going and make it happen.

Anyone seriously interested in lending their hand please contact me on PM. Thank you.

hexahedron · December 25, 2020, 2:05pm

I think before starting a project with a specific intention like this it is very important to spend time and research and discussion figuring out whether it is likely to achieve your goal, and whether there are better/easier/cheaper ways to achieve that same goal.

I suspect the approach of “train on human play first and then improve through self-play”, although it might help slightly, will not produce results as good as you want. Over the course of a reinforcement learning loop, it is very easy for a net to forget its source material and still converge to roughly the same point. (there is some good intuition you can gain from the “catastrophic forgetting” machine learning literature here about how and when this can happen).

In fact, one can see evidence that this happened with AG Master - except for perhaps some minor biases for a few opening moves like approach vs 3-3 invasion preference, overall AG Master’s “style” is actually very hard to distinguish from the top current bots. For example, try running KataGo 40b on AG Master vs Ke Jie Game 1 (future of go summit): agvskejie1.sgf (1.8 KB)

You should find that the agreement rate between the two is fairly high for the vast majority of the game and it’s not easy to say that there are any clear or consistent stylistic differences. Again, I think aside from very slightly opening preference differences, many people might be over-romanticizing the degree to which AG Master was human-like and human-understandable. It kind of wasn’t, not as much as people think.

And, if you say that all you do care about are the minor opening preference differences, if this is enough, then there are obviously far cheaper ways to slightly bias opening play and leave everything else the same than to do an entire fresh run.

Also training a new run, regardless of minor opening style preferences, will almost certainly not solve this problem either:

Also considering that lot of AI moves come with asterisk saying “provided next 50 moves are played perfectly”

I’m not sure how it could possibly be expected to.

So what is a possibly better solution? Well, let’s put that in a followup post.

Pond_Turtle · December 25, 2020, 2:12pm

Those… Are some very good points. What would you propose then?

gennan · December 25, 2020, 2:15pm

If you want to have AI that play more human-like, I think it would be better to train AI for just that goal: Train them to predict/imitate humans at some specific level, without having them train by self-play afterwards (because that would make them surpass that level).

Such a training programme could even aim at training AI to play like specific human individuals, so you could create a match between “Shusaku” AI and “Takemiya” AI.

hexahedron · December 25, 2020, 2:17pm

Followup to earlier post: so what is a better solution, and one likely to be hundreds of times cheaper than doing a whole new run? Well, there is some interesting development on this front in Chess-land, with Maia: https://maiachess.com/.

Okay, let’s dig in and analyze this:

Firstly, one thing to note is that Maia is NOT a reinforcement-learning loop and has nothing to do with AlphaZero. It is literally just supervised learning (SL) on human data, just with good analysis behind it. And an interesting discovery from that analysis is that you can’t add too much search on top of it, or else it becomes much less “human” again. They found actually you can’t add any at all, but I’m suspecting that it will be different for Go, especially if you do it “right”. But anyways, certainly the amount of search you can reasonably add will be limited, or else we get the “perfect 50 moves” phenomenon (which is exaggerated by the way, even KataGo doesn’t normally read 50 moves deep), so raw policy accuracy will be critical.

Okay, there’s an issue though - in Go, we know that any time someone tries to train a net purely with SL on human data, it caps out at about 4d to 5d in raw policy strength, at absolute best, and often more like 2d or 3d. Well before AlphaZero people knew this, you can check out things like CrazyStone and Zen back in the day, which also used SL-trained neural nets. Even for human uses, that’s not as strong as you’d like.

So I think what might work better is perform finetuning of existing nets on human data. KataGo’s raw policy is closer to 8 dan amateur, and has good move diversity (due to temperature-based training and policy target pruning) and even some understanding of ladders beyond its input features (which you will need, since that’s the major thing humans glaringly do better than pure-zero AI). The idea is that starting from such a stronger point, we might be able to adjust towards human style through SL finetuning, but without the same issues that would normally cap you at 4d to 5d. It would be not all that expensive to try computationally - once the code is written, you just need one GPU and a bit of time to try it.

Secondly, it’s worth being clear about the goal again here. It would be only a small success to have a “human-style” bot that plays at (say) just slightly beyond human pro level in terms of analysis reliability, and does nothing else. What you would really like to do is have a full top-AI-strength bot but that also understands how players of all ranks play, including even how kyu players play.

If you can do this, opens up a huge range of possibilities - for example perhaps a 7 kyu player could ask “please give me the moves that a 3 kyu player would find intuitive and would likely be able to play accurately” to learn what they need to cover the next 4-stone gap, but also with pro-level or KataGo-level background accuracy of analysis, so as to also avoid teaching 3-kyu bad habits.

How to implement that effectively would take research, of course. I don’t know exactly how you would do it. But I don’t think you get there simply with a “AG master”-like bot that only has slightly difference preferences for the opening. Having pondered it for a while and also taking into account Maia’s findings on human-correspondence accuracy of nets in Chess, I think the starting experiment to try to begin to achieve the above is:

Take a top neural net, such as KataGo 40b, and attempt to fine tune on human data, with ranks of players labeled in the input to the net, so that the net can explicitly model how players of different ranks play, for each rank.

This allows you at usage time to specify “please show me the policy intuition for players of rank X” for any X. Depending on how well that works, there are potentially lots of directions to take it from there.

shinuito · December 25, 2020, 5:30pm

If one only cared about the opening as well one could maybe just give katago an opening library to pick moves from while they exist and if you hit the end of a tree in the library let it go back to normal katago.

It’s probably not the intention though. Could be interesting to see how it turns out anyway, that kind of katago master

claire_yang · January 2, 2021, 2:07am

If need to behave and mimic specific players, not only do AIs need to learn their opening moves but also how human “blunders”. How they lose their games might be the key to make it human-like.

Different players have different types of blunders in different stages of the game just like their favorable opening strategies.