How Deep is your Go is a website that can take your SGF games and give you an OGS-scale rating number and rank.
If you send multiple files, make sure that the player name is exactly the same in all of them, so that the program knows whether to evaluate the black or the white moves. If it’s just one file or the name is otherwise ambiguous, you will need to enter the name in the box and again, it should match the player name in the SGF record.
Those who have closely followed my every word on these forums might remember that one time when I mentioned working on my thesis for my Master’s degree. This includes the training of a neural network that can do what you see on the website, and the website to do it.
Well, I did that and today is the day where you can try it out for yourself.
Shoutout with thanks to my friend Zerix, who provided his server with a suitable GPU to host this, and also advice and guidance for all the required web and domain setup steps.
Once I actually graduate, sure.
I still need to polish the thesis paper, address the final feedback and hand it in. There is a seminar presentation upcoming, and the defense (exam). With all the formalities it will take until December.
Error messages are admittedly not all that fleshed-out. It’s more like a haphazardly welded steampunk pile of junk.
The error message appears because you have either not entered a player name in the name field, or not entered one of the two player names exactly as they appear in the SGF file (either .stone.defender. or sandyfriend123).
If you give more than one of your games, against different opponents, you can leave the name blank and it will find it automatically.
I might fix that uninformative output in the near future, but not today.
I remember OGS ranking had been adjusted a few times in the past. So just out of curiosity did you cut off or adjust them in older records in your training data?
The complete dataset is filtered into a pool of games by several quality criteria, like only 19x19 no-handicap. All the games in the pool are then subjected to the same Glicko-2 implementation that OGS uses (goratings). Just like OGS re-calculated all the ratings, I’m using ratings calculated under this one system as training data.
By the way, this service is offered with zero warranty!
IIRC OGS goratings still need to be calibrated with anchors, and there were a massive survey back in the day. I think they were calibrated to EGF or AGA rating (or some weighted average honestly not sure)
Based on just small samples from people testing here, I wonder what is the MSE or RMSE for your model? (it seems to be pretty large, 3 or 4 ranks at least)
There is ground truth with known OGS ranking, so it shouldn’t be hard to test.
Test data doesn’t necessarily need to be fresh (latest) games, sometimes for output that is time-dependent, it might be better to split test-data out of the training dataset with a relavent time-frame.
Think of it as people are constantly learning and their strength can go up and down, and we have no idea if their strength has reached a relatively steady state, and even if their strength has stabilized, they can still jump up and reach another steady state. If you always just split the tail of the dataset to be test data, the model might not be fitting to predict current strength (but a projection).