2021 Rating and rank adjustments

As far as I know the server exclusively works with Glicko ratings. Also when it comes to calculating handicap and komi.
There is a formula that calculates kyu/dan ranks from the Glicko ratings but my assumption is that this is only used in the user interface for showing it to the user. So even if you would change this formula significantly, it would have no effect on pairings or handicap calculation whatsoever.
Correct me if I’m wrong.

1 Like

I believe the idea is that this would only be shown locally, and be cosmetical in the sense that the underlying rating system (Glicko) would not change. Hence, you might select the option to display AGA ranks, and all ranks appear close to what they would be in the AGA, while I might select Tygem ranks, which shows the same players with different ranks that are close to what they use on Tygem.

Meanwhile, when we get matched by the system, this is based on our “actual” rating, given in Glicko points. The kyu / dan ranks that are displayed on both of our sides are not used by the system, and purely for show.

3 Likes

As I understand it, automatic handicap is based on the rank difference according to the actual OGS Glicko to Rank conversion formula. So that would be only thing that may not work when everybody uses their own rank conversion from Glicko ratings.

1 Like

Yet again, you are imagining that only 19x19 games are played. Nobody here but me seems to realize that ranking calculations must be different for different game boards. Unless I’m wrong, and no one has shown that clearly.

There’s a big list of tables from last year that was supposed to say that combined rating (the overall, which includes 19x19, 13x13 and 9x9) was just as good of a predictor of who would win as the individual ratings for each size (which ogs also keeps track of)

Does that mean that if you play only 9x9 that your overall rank is your rank for any other board size or go association purposes? Nope! It just means what it means, that on OGS if you play another player with lower rank you should have a better chance of winning. Maybe in some cases the individual board ratings would do a better job, but on average the overall isn’t supposed to be too bad is what I drew from it.

We also don’t really know at the moment how well the new OGS ranks will map to other ratings systems, but after a lot of games are played maybe there’ll be enough data and settling of ranks to draw up new conversion tables.

5 Likes

Incorrect, the assumption is not that 19x19 games are the only ones played, but rather that they are the only ones that matter as a guide to where to part the ranks. This is because that is where all serious tournament games are played. You can estimate handicap from this to other ranks through certain estimates (I use 1 stone 9x9 ~ 2 stones 13x13 ~ 4 stones 19x19 = 4 rank diff 19x19)

10 Likes

Exactly. I think the ranks of players who play both 19x19 and 9x9 are used to get a scale for how strong a 19x19 1d, or 10k, etc. would be in 9x9. Then this scale can be used to put players who purely play 9x9 relatively in the right position.

This does show, and perhaps this is the source of all confusion, that the comparable strength of players is nonlinear. It is impossible to accurately predict how a 19x19 player will do against a 9x9 player, or vice-versa, yet the same rank is used. Generally speaking 9x9 skill or 19x19 are heavily correlated, more so than a chess player would be compared to a Go player.
In fact, strictly speaking ranking is not even transitive: if player A wins most games against player B, and player B wins most games against player C, it is not automatic that player A wins most games against player C. There can be different playstyles that translate to each other in a rock-paper-scissors relation. Of course a ranking system that incorporates these things is perhaps more accurate, but definitely not more insightful.

Whenever I talk about ranks and handicap, I take the 19x19 situation as the basis, since this seems to be universally the most played size (many servers don’t even offer ranked 9x9 games). The whole story works analogously for 9x9, except that we scale it to be comparable to 19x19 rating, not to the 1 handicap = 1 rank scaling.

8 Likes

When cycles like this occur, I would expect these players to have similar strength (I don’t think you’ll find real life players where player A can consistently give 9 stones handicap to player B, who can consistently give 9 stones handicap to player C, who can consistently give 9 stones handicap to player A). Their individual ratings will depend mostly on how well they do against opponents outside of this group.
Rating systems need a large group of players with lots of interaction between subgroups to work well. If the populations consists of many small isolated pockets of players, each subgroup can drift away from the other subgroups and the rating system won’t work very well.

I see a rating system a bit like a thermodynamical system, where each player is like an atom. In a thermodynamical system, the law of large numbers comes into play on macroscopic scales, so there can be an aggregate (semi-)equilibrium with properties like pressure and temperature gradients. You can find temporary local anomalies (such as the rock-paper-scissors cycle described above) when you zoom in on individual atoms, but those shouldn’t affect the macroscopic picture when the system is large enough.

I wonder if OGS ever analysed their data on 9x9 handicaps. It seems OGS simply uses the AGA system of 4 (19x19) ranks per 9x9 handicap stone, but is this really the best fit to the data?

5 Likes

Wow, you are a bit quick in your conclusion.

It’s so obvious to me that calculations regarding handicap stones must be different for different board sizes that I wouldn’t even bother to bring up the question in the first place.

I think this is indeed because most players will play with a wide variety of opponents. However, it’s quite easy to create a ranking system that will behave like my ABC example without being too contrived, e.g. a system that gives a “strategy rating” to players of Go, chess and checkers combined. Naturally, if we have (A) a strong Go player who knows some chess but never played checkers, (B) a strong chess player who knows some checkers but never played Go, and (C) a strong checkers player who knows some Go but never played chess, then these players will behave as you expect on an “average” game (1/3 of the games being Go, 1/3 chess, 1/3 checkers): A is expected to lose the chess and checkers games against B, but win the Go games, thus A loses 2/3 of the time against B. Similarly B loses 2/3 of the time against C, and C loses 2/3 of the time against A.

Since the different board sizes are quite similar in Go, it’s probably not going to be as large as 9 stones handicap, but I’m quite sure we can find players as in the A, B, C scenario, that are similarly rated by OGS, where we replace “Go”, “chess” and “checkers” above by “19x19”, “13x13” and “9x9”.

2 Likes

The OP (original posting) of this thread describes the Go modification of the Glicko System (see http://www.glicko.net/glicko/glicko.pdf). The original Glicko system is for chess, not Go (sorry to capitalize the name of my favorite game), and chess has a fixed board size of 8x8.

The OP does not mention board size once. While I have not read the over 500 postings in this thread due to lack of time, almost all the recent postings (other than my own) have not mentioned board size.

So, to those who believe that I am making a snap judgement: no, I am simply stating a fact. Most go players play 19x19 games and assume that’s the way the game is played.

Now, in my many postings here, I have not objected to the new OGS rating system for 19x19 games. I don’t have enough experience to make any such objection, and I assume it works fine.

The only point I have been focusing on here is that those who play only or mostly 9x9 games (such as myself) have not been given a reasonable new ranking. Few of the other posters here have addressed that point, perhaps because Go players may enjoy argument for its own sake (just to invent a reason for something that puzzles me).

Again, my point is that if a rating or ranking that is calculated for 19x19 games is used to determine a rating or ranking for a 9x9 game player, that rating or ranking may be incorrect in absolute (not relative) value, since a single stone played has a very different effect.

All I have asked for is some verification, some evidence that this new ranking is reliable in the case of 9x9 players. Does AGA or EGF ever rank 9x9 players? Probably not. Can experienced 9x9 players rank themselves? Probably, to some extent. In the absence of any verification, I have asked for separate ratings and rankings for 9x9, 13x13, and 19x19 games. That’s three calculations for each player rather than one. I see this issue as far more important than separating the ratings and rankings for the three board timings offered in OGS.

So, perhaps what I am asking for is impossible. Or, perhaps, it is possible. I’m just asking those who really know about these things to think about them, and respond to what I’m actually saying.

There are a few misconceptions and/or misunderstandings…

1- Absolute ranking does not exist. Rankings are defined relative to a group of people. You being 9k in OGS only means that. If you play in another server, you’ll have a different ranking ranging from 5k to 14k according to some (probably outdated) comparisons.
2- Rankings are not supposed to measure your strength. Rather they are aiming to sort players from weak to strong, so they can be compared (and matched to play).
3- OGS does give you a separate ranking for each boardsize and time setting. You appear to be 7k-9k for live 9x9 games. That only means that when playing other 7k-9k players from OGS you’ll have an even game.
4- The way to convert glicko to kyu/dan is irrelevant (as long as the conversion is monotonous, I think, maybe there is some other requirement). Once this is said, it makes more sense to use a nice, smooth, interpretable function.
OGS Developers have chosen to have the origin in the [I don’t remember the details] 1d AGA and the step size measured by 19x19 handicap. That’s a fine mapping function, right there. It gives us some insight to compare partially to another pool of players (AGA,EGF), although strictly speaking it’s only valid to compare OGS players.
5- Units are meaningless. We could get rid of the kyu/dan and use “david265” units, where you are 1david265 and someone is 9david265 if you need 9 stones to beat him/her in 9x9.
Then we’d use glicko to sort the players and map the rest of the rankings using that funcion…

Conclusion… The number next to your username is only a label used to find suitable players for an even game. Any reading further than that only means that you are not understanding the rating system.
(with the new formula the new ranking gives a rough approximation of your ranking for AGA/EGF games, which is nice!)

17 Likes

I’m not sure how I can explain it even more clearly:

OGS uses one single sole individual rating system for ALL board sizes and speeds.

You keep hammering on 9x9 being ignored: it’s not. It’s not getting treated differently than 19x19. It counts just as well for your overall rating. All three of 9x9, 13x13 and 19x19 board sizes and all time settings are included in the rank calculation.


Handicap stones are distributed differently for 9x9 than for 19x19, so 1 rank difference does not equal 1 handicap stone difference. Instead a conversion formula is used. That’s all there is to it. Whether this is accurate, is something @gennan discussed above, but I don’t believe this is what is bothering you.

This has nothing to do with each other. Nowhere in the computation does Glicko use anything other than who won the game and who lost the game. Board size is not a part of Glicko.

If you’re confused about what Glicko is, just replace it with “Elo” whenever you see it, and the story will be completely analogous.

8 Likes

About this: Yes, what you’re asking is impossible, as has been explained about 10 times now.

It’s hard to respond to what you’re actually saying, since you’re saying nonsensical stuff. People who really know about these things and think about them tend to understand that what you’re asking for is impossible and meaningless.

2 Likes

I think it’s quite contrived to combine 3 largely independent skills into one rating number. Yes, it’s not hard to make such a rating system, but what’s the use?

If I would believe (I don’t) that go skills on different time settings and different board sizes are largely independent, then I would agree with @david265 that each should have their own separate rating system, based only on statistical analysis of game results and handicaps on that board size and time control, and no attempt should be made to combine a 5d rating for correspondence 9x9 and a 25k rating for blitz 19x19.

3 Likes

I agree, so from a large-scale perspective rank is linear. My point is that on a small scale it is not linear. Probably these are within the uncertainty of other factors, like concentration, effort, etc.

I just wanted to point out that it’s not automatically a given that if you’re ranked higher, you’re also automatically stronger. The different time settings and board sizes tend to be very similar to each other, since most players mostly stay within their own pool, and those who play in several pools tend to do it regularly (thus calibrating the different pools). The nonlinearity mostly is apparent if someone who is only playing 9x9 starts playing 19x19, or someone who only plays correspondence starts playing blitz, etc. There it will be apparent that the system is nonlinear.


Apart from this, it does seem that, according to some posts above that I can’t find now, using the overall rating or the individual rating per pool for matchmaking makes little difference. Hence the nonlinearity is really a local and small problem.

2 Likes

Yes, I doubt that an intransitive cycle could occur with very large rank discrepancies, but I believe it should be possible where each player is just slightly, but significantly, better than the next (in the cycle).

I would imagine that this phenomenon would be more prevalent at the lower ranks, where players are bound to have various different strengths and weaknesses. Among higher ranked players, like dans (traditional sense, not my Only𝒟𝒶𝓃𝓈 proposal), I would guess that the likelihood of an intransitive cycle being much lower, since such higher ranked players need to shore up weaknesses in their play and be fairly strong all around.

2 Likes

Regarding intransitive cycles (or should the phrase be nontransitive?), here’s a neat concept:

5 Likes

To use the nontransitive dice for an OGS-related concept, think of three players who are equally skilled when sober in Go, but happen to play games after having had X beers, where X is distributed as the dice are.

6 Likes

Yes, if subgroups arise that have little contact with other subgroups, the ratings of the members of that subgroup can drift away from the “common truth” of the rating system. This could arise for different board sizes, time control, time zones, countries, languages, friend groups etcetera.

But even if you only play (for example) 9x9 correspondence with your friends in the PST time zone, this “rating island effect” does not occur automatically. As long as your opponents or your opponents’ opponents regularly play opponents from other subgroups, I would expect that your rating is still sufficiently “connected” to the ratings of all the other OGS players. I would not expect that your rating is proven completely wrong when you suddenly start playing opponents outside of your subgroup.

I suppose this “connectivity” model also has analogies in physics, such as diffusion or heat flow in a system composed of many components of different shapes, sizes and materials. This may be an interesting modelling question. Has anyone ever investigated the internal “connectedness” of OGS ratings?

1 Like