How Deep is your Go?

How Deep is your Go is a website that can take your SGF games and give you an OGS-scale rating number and rank.

If you send multiple files, make sure that the player name is exactly the same in all of them, so that the program knows whether to evaluate the black or the white moves. If it’s just one file or the name is otherwise ambiguous, you will need to enter the name in the box and again, it should match the player name in the SGF record.

Those who have closely followed my every word on these forums :wink: might remember that one time when I mentioned working on my thesis for my Master’s degree. This includes the training of a neural network that can do what you see on the website, and the website to do it.

Well, I did that and today is the day where you can try it out for yourself. :slight_smile:

Shoutout with thanks to my friend Zerix, who provided his server with a suitable GPU to host this, and also advice and guidance for all the required web and domain setup steps. :pray:

27 Likes

Congratulations are in order, or?..

3 Likes

Once I actually graduate, sure. :slight_smile:
I still need to polish the thesis paper, address the final feedback and hand it in. There is a seminar presentation upcoming, and the defense (exam). With all the formalities it will take until December.

12 Likes

image

2 Likes

Error messages are admittedly not all that fleshed-out. It’s more like a haphazardly welded steampunk pile of junk. :sweat_smile:

The error message appears because you have either not entered a player name in the name field, or not entered one of the two player names exactly as they appear in the SGF file (either .stone.defender. or sandyfriend123).

If you give more than one of your games, against different opponents, you can leave the name blank and it will find it automatically.

I might fix that uninformative output in the near future, but not today. :wink:

2 Likes

image

2 Likes

This is not another bug report, is it? :sweat_smile: Maybe this is a good omen for your rank.

With a sample of 4-5 games, the model is about as accurate as Glicko-2.

2 Likes

last 6 games
image
image

2 Likes

I remember OGS ranking had been adjusted a few times in the past. So just out of curiosity did you cut off or adjust them in older records in your training data?

image
image
image
image
image
7 games

The complete dataset is filtered into a pool of games by several quality criteria, like only 19x19 no-handicap. All the games in the pool are then subjected to the same Glicko-2 implementation that OGS uses (goratings). Just like OGS re-calculated all the ratings, I’m using ratings calculated under this one system as training data. :slight_smile:

By the way, this service is offered with zero warranty!

3 Likes

image
image

2 Likes

image
image

9 live even 19x19 games vs humans

1 Like

IIRC OGS goratings still need to be calibrated with anchors, and there were a massive survey back in the day. I think they were calibrated to EGF or AGA rating (or some weighted average honestly not sure)

image
image
image


image
image

1 Like

Based on just small samples from people testing here, I wonder what is the MSE or RMSE for your model? (it seems to be pretty large, 3 or 4 ranks at least)

There is ground truth with known OGS ranking, so it shouldn’t be hard to test.

Test should be done with fresh games. Something may be wrong with ranks in games from times before Glicko.

1 Like

Test data doesn’t necessarily need to be fresh (latest) games, sometimes for output that is time-dependent, it might be better to split test-data out of the training dataset with a relavent time-frame.

Think of it as people are constantly learning and their strength can go up and down, and we have no idea if their strength has reached a relatively steady state, and even if their strength has stabilized, they can still jump up and reach another steady state. If you always just split the tail of the dataset to be test data, the model might not be fitting to predict current strength (but a projection).

image

3 Likes

Nice job! I tested it with my games, only even 19x19 games against high kyu or low dan opponents.

  • 5 correspondence games: 2.0d
  • 5 live games with long time settings: 1.2d
  • 5 fast games on Tygem: 0.1k.

Pretty consistent with the fact that game quality decreases with faster time settings.

3 Likes