A couple of questions about handicap and rating

Samraku · April 28, 2024, 12:05pm

With komi 0.0 and no compensation for handicap stones, yes, by definition in my opinion

Samraku · April 28, 2024, 4:42pm

Finally lost provisional, and am 4k, so one rank below my Main Account. It remains to be seen if I can climb higher

xela · April 28, 2024, 11:26pm

So back to the original question:

If someone has significantly different ratings for live and correspondence, why are handicaps being determined by the overall rating, and not by the appropriate rating for the time control? It looks like we have a system that’s designed to avoid the need for multiple accounts, but isn’t actually doing its job. And the various chess sites have demonstrated that it can work very well this way.

hoctaph · April 29, 2024, 12:41am

I forget which threads, but I remember reading that this is because the
overall ratings are more accurate, due to accounting for more games.

Samraku · April 29, 2024, 9:06am

Yep, that’s the party line, but even granting that’s true, I maintain that there are compelling reasons to compromise on that in order to provide a better overall experience

xela · April 29, 2024, 10:37am

Um, what does “more accurate” mean in this context? Is there data to prove that people don’t actually play at different strengths in different time limits, and that differences in the ranks are some kind of illusion? I’m sceptical, but willing to be convinced… In any case, if creating multiple accounts is the recommended option, then it doesn’t look like the system is working.

Jon_Ko · April 29, 2024, 3:26pm

It means the overall rating was better at predicting the outcome of games.

xela · May 1, 2024, 11:41am

OK, that’s surprising and interesting! Can anyone easily dig out a link to the threads where this was discussed?

Jon_Ko · May 1, 2024, 12:26pm

Regenwasser · May 1, 2024, 1:51pm

Allegedly. They neither published the data nor did they make it available via the API. So there is no way for me to verify this. This topic has come up before.

benjito · May 1, 2024, 2:13pm

Ratings information exists via the public API, otherwise it could not be displayed on the profile page. It’s possible to see what APIs are being called if you go to Browser Developer Tools > Network tab.

I can share a route once I get to a computer, but I figured “teaching to fish” might be more useful anyway.

Regenwasser · May 1, 2024, 2:15pm

You’re right in theory the data has to come from somewhere. But I did exactly that a year ago and there was some issue.

But I’m looking forward to be taught how to fish. Maybe you can figure it out!

benjito · May 1, 2024, 2:27pm

Okay, just checked. There are a few XHRs from that page, but I think the one you’re looking for is /termination-api/player/{PLAYER_ID}

Screenshot 2024-05-01 at 10.21.47 AM

The ratings field has the full breakdown from the ratings table:

"ratings": {
    "9x9": {
      "rating": 1388.705030253521,
      "deviation": 68.04753746552261,
      "volatility": 0.060000229890117066
    },
    "live": {...},
    "13x13": {...},
    "19x19": {...},
    "overall": {...},
    "version": 5,
    "live-13x13": {...},
    "live-19x19": {...},
    "correspondence": {...},
    "correspondence-9x9": {...},
    "correspondence-13x13": {...},
    "correspondence-19x19": {...}
  },

shinuito · May 1, 2024, 3:40pm

I don’t really understand the scepticism.

I feel like anoek has been fairly open to discussing any aspect of the data analysis

and previously

and you can even download a big batch of games and the rating system code

You can do your own experiments with that.

But specifically

2020 Rating and rank tweaks and analysis

From this data there’s two things to note:

Using a combined rating works quite well, certainly comparable or better than looking at per-size strengths by themselves. It seems to me like it makes sense to keep using it.

Using overall ratings to predict 9x9 games works pretty good at HC 0 and at HC 1, indicating to me that the strength bands are pretty compatible with 19x19 or just “go ranks” in general. However, going beyond HC 1, predictions start to get bad pretty quick. I believe this is an indication that the “Old Japanese Recommendation” is not so great for us, and that we should strongly consider figuring out what the best 9x9 (and probably 13x13) handicap setup should be.

EDIT: The question arose about considering blitz vs live vs correspondence ranks, here’s the data from that, which I believe is still very supportive of using an overall rank for picking handicap.

Etc.

Regenwasser · May 1, 2024, 4:07pm

Ah, now I remember the issue I had a year ago when I did this. Look what I need for this analysis is all the game records of the players plus the different rating information at the time of playing. Through the API I’m able to get the game records and I’m able to get the current rating information, but I was not able to retrieve all of the player’s category rating information at the time of playing. That is the route I’m looking for.

Sceptic by nature, nothing personal.

I haven’t read through them again but iirc the first two links you posted only go into the result of the analysis and do not provide the source data. The goratings-repo is new to me, maybe that’s exactly what I was looking for. Thanks for the link. I will check it out when I’ve got time.

Serjpinski · May 1, 2024, 4:45pm

About this data, I’m not very convinced.

The total count of games doesn’t match: the tables for overall rating have more games in all handicaps. As far as I understand, the same games should be used for predicting with overall rating vs rating for different sizes/speeds.
Even if this data is correct and the overall rating works well for a majority of players, that doesn’t rule out that there is a problem with an specific type of players which is significantly stronger in some specific size/speed. ¿Does this type of player exist in reality or not? That’s what we should investigate. If it exists, maybe the solution would be to let the player decide if the system should use their overall or specific rating.

square.defender · May 1, 2024, 4:51pm

if someone always play 9x9, reached high level and never played 19x19, of course they would be some ranks weaker on 19x19 for a while. Its rare but possible.

square.defender · May 1, 2024, 4:55pm

current system may be better for most players, but of course it inevitably would cause problems to those rare cases

square.defender · May 1, 2024, 4:57pm

100% separate ranks may be better for those rare cases, but it may cause problems to most players

benjito · May 1, 2024, 5:30pm

This ^

I’ve seen a few players (including myself) with one or more outlier. But often it comes down to “this player hasn’t played this setting very much recently”. I’d be interested to know to what extent this specialized players exist.

Still, I think anoek’s finding holds pretty well overall. I haven’t seen strong counterexamples.