Automatic handicap set by "Overall" rank

amodeo · July 9, 2017, 4:53pm

I have only recently started playing on OGS, and I like it, but there was one thing that surprised me that I don’t like.

When I offer to play a game, I always choose automatic handicap, so the game will more likely be a challenge for both players. I am often surprised with the result. Wouldn’t it be better to set an automatic handicap based on our ranks the type of game we are playing, rather than by our “overall” ranks?

I just play what OGS calls “live.” I don’t play blitz or handicap. Some of my opponents have a big difference between their live rank and their overall rank. They might be about as strong as I am in non-blitz non-correspondence games, but play very much worse or very much better than that at blitz and/or correspondence games. Wouldn’t it be much better in such cases to play without a handicap?

Of course it works the other way around, too. If I play a game that is not blitz or correspondence, and my opponent and I have the same “overall” rank, but one of us is quite a bit stronger than the other in non-blitz, non-correspondence games, then wouldn’t a handicap based on our “live” ranks be more likely to make the game challenging for both of us than if we played without a handicap?

mlopezviedma · July 10, 2017, 2:54am

Hi, @amodeo, welcome to OGS!

As experience has shown with more than a year of using this system, the overall rank is the most representative of a player’s rank. Moreover, right now the developers are working on changing the way ranks are shown, and the overall rank is going to be the only one used in these calculations.

In an ideal world, you are right. But in practise, most of the users don’t play every time control. So you’ll see that most folks have only one rank that is following his/her overall rank in a quick pace, and the other two get stuck near their initial rank (most of the times 25k). This is why we mods need to constantly update those ranks when we find them, and this mostly happens when they are reported, so we usually fix it one or two games late. Fortunately, with the avobe mentioned rank update we won’t need to do this anymore.

StevenageTony · February 10, 2023, 5:35pm

With respect, this statement would appear to be nonsense, except insofaras “a player’s rank” at the end of the statement might refer tautologically to “the overall rank” mentioned at the beginning. How can overall rank be more representative of a player’s strength in a certain category of play than that calculated by analysis of his previous play in that category? Ranks are supposed to be calculated so that there will be a 50% chance (or as near as possible to that) of either player winning if they have the same rank. It seems obvious to me that players need to be handicapped according to their results in the category of game they are about to play. If not, why not?

Otherwise, what is the point of calculating their rank in nine different categories anyway? You might as well just update their aggregate rank and show that single figure if that is what you are going to use to calculate the handicap and komi.

It seems to me that you would not have to manually adjust anything if you used the data you have for the particular type of game that is to be played to calculate the handicaps. It wouldn’t matter that someone never plays smallboard if you stopped factoring in their supposed rank in the categories they don’t play (much?) or that they have developed great skill on smallboard but quickly get lost on 19x19.

I see that this thread – consisting only of the same query that I came to make and a single, apparently far-from-satisfactory reply – closed five years ago. Perhaps the reason it didn’t generate more discussion at the time was that the reply referred to work in progress which was supposedly going to improve the situation. Did it (assuming that the work was completed)? I still see players who do well on small board getting no handicap in 19x19 live games against opponents supposedly 3 stones stronger in that category of play. How can that not be unfair?

teapoweredrobot · February 10, 2023, 5:57pm

With respect, I’m not sure there’s a great deal of sense in necroing a six year old thread but here we are so I’ll share my thoughts.

Firstly, it’s generally best to think about a generous purposive interpretation of things people write on the internet rather than following too much of a picky textual analysis.

In this case the sentence is not nonsense but maybe might have been phrased differently. it is the case that the overall rank is most representative of a players “actual” strength. In that the overall rank is a better indicator of expected outcome than any of the individual rankings. This might be a surprise or counterintuitive but as I understand it, that’s what the stats show. One way to think about it is that the individual ranks are necessarily calculated from smaller pools of games as compared to the overall rank so errors are more likely to be “smoothed out” in the overall rank.

Just for fun and for people’s information.

Maybe we should - but I suspect that if we did, people would complain about that information being taken away I guess. What would cause the least complaints?

This is one of the perils of replying to a six year old post - the manual adjustment that used to be required is no longer needed thanks to the new ranking calculation system alluded to in the post now being in place (and since updated more than once I think!)

There is more information in various other threads about this and if I I get the chance I’ll see if I can find it and link it.

I’d be happy to see the data on outcomes etc but I suspect that you might be surprised as to the fairness of it. The skills needed to play go are very transferrable across board sizes generally.

Groin · February 10, 2023, 6:09pm

To make things clear: for the rating changes and next for the pairing, whatever the kind of game and,as long as it has a valid format (board size, ranked, etc…), there is only 1 rating used (the global one)
All the detailed other ratings are pure statistical information and are not used in any other way.

That is surely a wise decision considering how more complex it would be to separate the calculation (and aggregate) although it has been suggested but after some “studies” it seemed that they were not enough distortion to go this way.

There are threads on this published before, i encourage you to use the search of the forum (which is quite messy i do agree!)

StevenageTony · February 10, 2023, 6:53pm

It’s better to save jargon like this for communicating with your fellow technocrats. Please use plain English when dealing with lay users. If you mean raising a question that was buried, the reason is that I don’t think it has been dealt with satisfactorily, since I was driven to ask exactly the same question and didn’t find the sole answer to it satifactory.

I didn’t mean it doesn’t make sense, but that it is clearly wrong (as far as I can see), unless, trivially, it’s a tautology.

There is no such thing as a player’s “actual strength”. There is a rank representing the probability that the player will win against another player. This is determined, is it not, by analysing the player’s results against other players, taking into consideration the rank awarded to those opponents.

I don’t see how you can get a better expectation of the probable result of a game in a particular format by contaminating the data relating to previous performance in that format with data from play in other formats.

I thought the point of the ranks was not to wear as a badge of status but as a mathematical tool for calculating the expected outcomes of games and estimating what handicap and komi will most nearly give a 50% chance of either winning.

Wrong question. Right question: what would give the most accurate handicaps/komi for particular matches?

If that information helps to answer the question, that will be great.

What surprises me is seeing someone who has played hundreds and hundreds of 19x19 live games and has been accorded a 13k rank for that type of play getting no handicap against someone who has played a similarly large number of games in the format and been assigned a rank of 10k, in a pairing for that very format of game. The best way you (or someone) can satisfy this enquiry is by answering this point.

Then why do players end up with very different ranks in the different categories? It’s not just size, either. There is categorisation by speed of play as well.

StevenageTony · February 10, 2023, 6:56pm

I searched and found exactly the question I wanted the answer to. If the same question has been asked elsewhere and has had a different (preferably better) answer, do please let me know.

Groin · February 10, 2023, 7:05pm

I remember that there is one in which @anoek (the owner of the site) gave a detailed answer on this.

qnpnpmqppnp · February 10, 2023, 7:11pm

You may want to be a tad less agressive when writing your opinion

Anyway, here is a detailed explanation: 2020 Rating and rank tweaks and analysis (see point 2).

I agree it’s counter-intuitive, but please consider that people already spent some time thinking about it before deciding on the current system. This doesn’t mean you have to agree, but that should beg for caution before claiming this is complete nonsense and that the right solution is obvious.

Groin · February 10, 2023, 7:15pm

Another thing is that you and me are just some kind of influencer at most. Suggestions are usually welcome but after that weighted against all other things to correct or improve.
When your suggestion was proposed, it seemed that was a lot of effort for a marginal benefit, and then you can consider that there are many other things to improve too (or implement)

jlt · February 10, 2023, 8:24pm

I understand that if someone plays a lot in category A (e.g. corespondence) and much less in category B (e.g. live), then the overall rank is approximately the same as the rank in category A, and the rank in category B is not reliable due to lack of games. But I would conjecture that if someone plays a lot in both categories A and B (let’s say at least 10 games/month in each category) then the rank in category B is a better predictor than the overall rank for matches in category B.

Has this hypothesis been checked/proved/disproved? Sorry if I didn’t read the whole threads.

For instance I use two separate accounts. My account A is used for correspondence games and account B for live games. Account A is 2-3 stones stronger than account B, and yet both accounts are controlled by the same person.

ArsenLapin1 · February 10, 2023, 10:32pm

Note that there are a lot less games in category B on OGS, not just by the player whose rank you want to know, but also by their opponents. So their opponents also have less reliable ranks in this category, and this is going to contribute to that player’s less reliable rank too, I guess.

(This was pointed out to me recently)

BHydden · February 10, 2023, 11:26pm

In short, anoek tested it and found “more data makes game result prediction more good” regardless of board size and timing options. Of course there might be outliers, but when considering the reliability of the rating system across the site, using just the overall rank returned the best results.

square.defender · February 10, 2023, 11:30pm

It’s true that if someone plays often both ranked 9x9 and ranked 19x19, then it would be more accurate to use separate ranks of 9x9 and 19x19 for matchmaking and handicap purposes.

But, people often play not enough games on one of categories. For example someone could play some 13x13 games while being 20k, then become SDK on 19x19, then if they return on 13x13, they would be forced to be sandbagger if separate ranks were used. Always use overall rank automatically fixes that problem. Because of that there may be ±1 rank problems, but it makes sure there are no ±5 rank problems.

square.defender · February 10, 2023, 11:37pm

So I have and idea: what if we allow to use separate ranks in games, but only IF those ranks are not outdated?

(you can choose between using overall rank and separate rank)

(choosing separate rank become available if you played enough games recently in that category)

jlt · February 11, 2023, 7:57am

In my case I play almost only correspondence 19x19 (A) and live 19x19 (B). Almost no blitz, or 13x13, or 9x9. Rank A is consistently stronger than rank B by 2 stones because I spend a lot of time thinking in correspondence games but I’m pretty careless in live games. So it would make sense to use Rank A for correspondence 19x19 games, Rank B for live 19x19 and overall rank for other types of games like blitz 9x9 or correspondence 13x13.

The opposite situation exists as well (players who are careless in correspondence games and thus have a lower correspondence rank than their live rank).

Uberdude · February 11, 2023, 8:07am

At the risk of being a Tony, isn’t the solution obvious? The per-size/speed ratings have their own uncertainty metric. Only use them for pairing if it’s lower than that of the overall rating.

jlt · February 11, 2023, 8:19am

Suppose the overall rank is 1d ± 0.8 and a partial rank is 2k ± 0.9. The “obvious” solution is to use the overall rank, but also “obviously” the rank 2k ± 0.9 can be considered as reliable enough.

Uberdude · February 11, 2023, 8:29am

Yes, I did consider that case of partial rank uncertainty being only a little larger, but the ranges disjoint. So simple greater than could be tweaked. But I would ask, are such pairs of ratings likely to happen? If you have overall 1d and a partial X at 2k ± 0.9 then you will have some partial Y up at 3d or whatever to pull the average up from 2k to 1d in which case your mix of 2k and 3d game results means your overall rating is going to have an uncertainty bigger than ± 0.8. Unless they are somehow separated in time in such a way the rating window slicing is tricked.

jlt · February 11, 2023, 8:35am

I don’t know. My main account is currently 2.4d±0.8 overall, 2.6d±0.8 for correspondence 19x19 and 2.7k±0.8 for live 19x19 (the latter being a bit outdated). My other account is 1.7k±0.8 for live 19x19.