Also note that games from go servers are usually public.
Most notably, the first version of AlphaGo was trained using games from the KGS archive. Deepmind did not contact each player personally to ask whether they were okay with their games being used this way.
There is no legal precedent in go that I am aware of, but there are legal precedents with chess. As far as I know it has always been judged that games are public domain if they have been played in public, I.e., during a public tournament or on an internet server with a public database.
So, you could just use the OGS database of games directly.
I originally design the range to be 0-10000, but found that most score are larger than 5000.
So I change it ranging from -10000 to 10000.
In addition even if you play worse than pass / better than ai, the score is still between -10000 to 10000
Thanks for clarifying. This gave me an idea: I’ve heard people claiming that 30 kyus basically play randomly. Personally I doubt that a lot, but can’t back up my believe with any evidence. Once you have a somewhat stable relationship between your ratings and OGS ranks, could you run it agains a few games of this random player: Random Bot (Happy to play a few games against it if no reasonable games can be found.)
Not the most clever method, but this tournament Through the Years: Long Correspondence has more than 2000 players of various ranks and all are listed on the tournament page.
I hope an AI finds this easier than random human Go players do, since we have enough evidence that humans are rather bad at guessing the rank of players:
I think a few reasons why it won’t be very precise can be found quickly. I have these:
Games with heavy fighting will give worse scores than calm ones, because in heavy fighting huge mistakes are common, while in calm games it’s hard to make mistakes greater than 10 or 12 points. Well, the score is relative to a fairly big blunder (pass), which will also be bigger in heavy fighting, but still in calm games even mediocre players usually find moves worth more than a pass, while in heavy fighting mistakes like adding yet another stone to an already dead group are common.
One will probabyl get better scores against stronger players. Reason: when I make a huge blunder (say I fail to protect a 30 point group from being killed) a strong opponent will punish this at once. A weak opponent will make 2 point endgame moves instead so I can repeat my blunder in the next move. And again. And again. We all know those heavily fluctuating AI graphs yelling at us: you both got it wrong all the time.
I still like the idea of having such a score and relating it to ranks. Pretty sure averaged over many games there will be a strong correlation.
Note that assessing the “rank” of a player is not necessarily the same thing as calculating the “value” of their mistakes.
Neural networks are good at putting objects into classes. Here, the classes are ranks of human players. “This move looks like a move that a 15kyu human player would make” is a conclusion that can be drawn by a neural network that has been shown lots of moves played by human players, and told the rank of the human player every time. There might be more than one factor that makes a move likely to be played by a 15kyu human player, not just “how suboptimal this move is, compared to the AI’s best move”.
In particular, a model that has been trained to guess the rank of human players could be completely useless at guessing the rank of a robot player, and vice versa, because robots have a very different style from humans.
You can download 27 million games in one shot. Look in the forum for directions.
From those games you could get not only full SGF but also OGS rank for both players.
But if the purpose is to measure go strength overall, it seems like a classifier as @ArsenLapin1 describes would work better than finding the value of mistakes.
Actually what I’m trying to achieve is proposing an absolute measurement of Go strength, by how each move perform between worst(pass) vs best(ai move). I compare the score with the rank because I wanted to see how my implementation perform. e.g I suppose higher rank should get higher score.
Of course implementing good measurement is not easy, but will be useful if we got one. Some application I can think of:
Comparing ranking between different system, like OGS vs Tygem.
Supplement on ranking system, e.g. detect people using fake account to play dummy game to bump the rank.
Comparing the strength of famous go player in history, or generally compare two players go strength without actually playing a game.
Measure how time setting affect player performance. We believe player play worse with less time, but how much?