2021 Rating and rank adjustments

Eugene · January 29, 2021, 11:33pm

At one point there was actually rating decay over time between games, IIRC. Or was it uncertainty increase over time? Something like that anyhow. Do we still have that, does anyone know?

Also, notably, the X axis is time, not games-played (as it once used to be IIRC). It’d be misleading to show a step function between yesterday and the day before if those two games were actually played just before and after midnight

Plus in general it’d just look ugly if it were a step function graph between days where you hadn’t played.

So one way or another I can’t imagine there being much interest in changing that

willjum · January 30, 2021, 12:07am

Just wanted to let you guys know I went from 13-14k to 9k, which is pretty much my exact ranking on other servers. I think this new rating system will prove to be excellent.

DVbS78rkR7NVe · January 30, 2021, 12:30am

My 19x19 rating is higher than other sizes. Does that mean if I drop playing 9x9, 13x13, I’ll become stronger

utah2 · January 30, 2021, 12:50am

Eugene, I don’t know yet how to block-quote, but I’m not calling for a step function graph between days where you haven’t played. I’m calling for always immediately showing any change in your rating when you finish a rated game. If your rating changes when you are not playing (Does it?) then that would show up too each day. I would be okay if the graph is continuous to connect your rating at the end of each day, but it is silly to prorate your rating change over a month if you are not playing. If your rating changes in one day, your inactivity over the next several days should not blur the fact that your rating changed in that one day.

Question: Does one’s rating change on days without a rated game? If yes, please show these daily changes. Otherwise, I suggest either make a discontinuous graph for each rated-game result, or assemble all of the day’s results and plot ratings at the end of each day.

If you don’t like discontinuities, okay, how about a daily graph? That would be about the same value as a discontinuous graph in my eyes.

Eugene · January 30, 2021, 12:56am

I thought you were referring to the curve interpolation between the points, and suggesting it should be discontinuous instead.

That’s what it does though?

Each time you finish a rated game, your rating is updated and the graph updated to reflect it.

However, as I mentioned, the X axis is days, not games, so changes on a given day accumulate in the point for that day - there is only ever one point for day, but it does update with each game.

Personally, I’d rather see the X axis in games, so the rate of change on the graph is in games, not time. I think it was once like that but moved to being X axis is time when Glicko was introduced … because (if indeed I am recalling correctly) rating changes over time.

BHydden · January 30, 2021, 12:58am

I think only confidence deteriorates with time, not rating.

Eugene · January 30, 2021, 1:07am

Can you recall other reasons why the rating graph went from “games” to “time”?

BHydden · January 30, 2021, 1:10am

Not certain, feel like it possibly predated glicko? Think it’s been that way for a while… Possibly just user petition? Both views are valid and subjective.

utah2 · January 30, 2021, 1:34am

Nope, it is not. For example, I played games on Sept 18, Sept 22, and Dec 2 (2020). No games in between. The curve is smooth as if I lost rating points every day between Sept 18 and 22, and it is smooth as if I won rating points every day between Sept 22 and Dec 2.

I would rather it showed the result of a day fully on that day. Either make it discontinuous to do that, or connect end-of-day results with connected line segments that display what I’ve gained or lost each day I’ve played. The curve would be constant on days I don’t play.

I would also be fine showing game-by-game results, like you suggested.

ckersch · January 30, 2021, 1:51am

Pro ranks don’t correspond directly to player skill, though. They’re largely honorific, with 9p being awarded in many cases to individuals who win top tournaments or make major contributions to the game. Up and coming lower-rated players are often stronger than their higher rated opponents.

Amateur rankings are more often a direct mapping from some Elo-like value, but even there 9d represents an artificial ceiling. We know that top pros are at least 4 stones or so above that, and current top computers a few stones higher. In Elo-like measurements, the top players are hundreds of rating points above lower-rated 9p players, which simply isn’t reflected in the traditional ranking system.

Even if there’s some asymptotic approach to a top ranking based on handicap, that still isn’t really a skill ceiling, as it’s entirely possible that a better Go playing entity could beat a worse one almost 100% of the time in an even game, but have a worse than 50% record with a 1-stone handicap due to increasingly tight margins of play at the top levels.

meili_yinhua · January 30, 2021, 3:10am

once again, what I consider to be the rank ceiling is not the proper “skill ceiling” that would be reached by a perfect player. It is simply the highest possible rank. in fact, 9d used to be reserved specifically for the provably number one best player alive at the time.

I would argue it’s not an artificial ceiling, it is simply the rank of the best (or in this case, the best amateur) players currently playing.

I’m simply arguing that we have 3 different features of the kyu/dan rank system: a top rank, the distance between ranks, and a bottom rank. However, if you have the top rank defined (9d) and the distance between ranks defined (1 stone), then the bottom rank is determined by those two features and the playerbase (in our case 25k was determined to be that rank for our server). If you defined the bottom rank at 30k and made the distance between ranks the same, you might have a different top rank than 9d, 7d, or what have you. If instead you define 9d as the top rank and 30k as the bottom rank, then the distance between the ranks cannot be guaranteed to always be one stone.

Aumpa · January 30, 2021, 3:51am

I think it’s understood that handicap stones do not provide linear, uniform spacing between ranks at the upper amateur dan levels, nor very beginner kyu levels. It might be more accurate for, say, two stones corresponding to two ranks difference only for a limited range of like 4 dan (?) to 20 kyu (?), maybe. (Which fortunately is helpful for most amateurs.) Beginners don’t know enough about how to use handicap stones, and advanced players use them too well, and the margins of victory are much tighter. This non-linear quality of handicap stones doesn’t mean that there aren’t much wider, measurable skill differences above and below the effective range for handicap stones.

meili_yinhua · January 30, 2021, 4:03am

I was more going off some of the stated goals that anoek has mentioned on other forum posts to have ranks line up with 1 stone as much as possible. We have brought up the fact that in olden times dan ranks were separated by half a stone, but that was responded to by stating that that is not quite relevant, and that maybe they should be separated by one stone.

This is actually accounted for in the rating-rank conversion, as the higher ranks require more ELO/glicko score to achieve the same stone rank difference. This is why the conversion uses the natural logarithm to make this conversion.

runarberg · January 30, 2021, 4:41am

What happens to the puzzle page now… Are puzzle authors responsible to edit the rank field in their puzzles, or will that happen (or has happened) automatically?

meili_yinhua · January 30, 2021, 5:40am

luckily tsumego has a tradition of mismarking ranks on their puzzles, so it’ll be business as normal

gr8m8 · January 30, 2021, 5:50am

Awesome! Always kind of wondered what was up with the volatility, especially at lower ranks.

As a fellow dev, this explains a lot… ^^ Great job finally stamping this issue out! Excellent job @flovo & @anoek !

gennan · January 30, 2021, 10:52am

mekriff:

once again, what I consider to be the rank ceiling is not the proper “skill ceiling” that would be reached by a perfect player. It is simply the highest possible rank. in fact, 9d used to be reserved specifically for the provably number one best player alive at the time.

ckersch:

9d represents an artificial ceiling

I would argue it’s not an artificial ceiling, it is simply the rank of the best (or in this case, the best amateur) players currently playing.

I’m simply arguing that we have 3 different features of the kyu/dan rank system: a top rank, the distance between ranks, and a bottom rank. However, if you have the top rank defined (9d) and the distance between ranks defined (1 stone), then the bottom rank is determined by those two features and the playerbase (in our case 25k was determined to be that rank for our server). If you defined the bottom rank at 30k and made the distance between ranks the same, you might have a different top rank than 9d, 7d, or what have you. If instead you define 9d as the top rank and 30k as the bottom rank, then the distance between the ranks cannot be guaranteed to always be one stone.

The OGS 25k lowest rank is not a rating floor. Ratings can go lower, but they are just displayed as 25k. The same goes for the highest rank of OGS 9d. It is not a rating ceiling.

AFAIK the OGS rating system has no clearly defined rating anchor. The thing that comes closest to a rating achor is the initial OGS 13k / 1150 rating assigned to new members. I don’t know where that number comes from exactly, but for OGS it seems to work fairly well as an anchor.

The EGF rating system has a different rating anchor: 7d EGF signifies a boundary between amateurs and professionals.
This boundary is a bit fuzzy, because some amateurs are stronger than this level and some pros are weaker than this level. But overall, it seems to hold up fairly well.

From tradional pro handicap as used in the (now abolished) Japanese Oteai pro ranking competition, the gap between pro ranks is assumed to be 1/3 amateur rank. So the gap between 9p and 1p/7d EGF is assumed to be about 2.5 amateur ranks, corresponding to 3 stones handicap, which was the traditional handicap in the Edo period for games between a Meijin (~World Champion) and a 1p/7d EGF.

This means that a World Champion would have a level of about 9.5d EGF. Some rare go geniuses may even be a bit stronger than that. Go geniuses, like Honinbo Dosaku, Honinbo Jowa, Go Seigen, Lee Changho and perhaps Shin Jinseo may have peaked out as high as 10d EGF.

So you could say that the EGF rating system is more or less anchored to world champion level being about 9.5-10d EGF.

At the bottom, the EGF rating system has a rating floor. It used to be 20k, but it will soon be lowered to 30k EGF to accomodate for weaker players (especially children) participating in beginners/children tournaments, which are becoming more common.

In my experience, 30k is a ballpark estimate of the level of an average adult novice that has just finished a beginners course in a club. They know the rules, they know about life and death, ladder, net, snap-back and seki and they rarely need help finishing and scoring their games. There will be a large individual variation ofcourse, but I estimate that 90% of adult novices will have a level between 35k and 25k EGF. In my club, I (3d EGF) give 30k players 6 stones handicap on 9x9.

When these adults novices continue playing and get some tuition, I think their level will usually go up to 20k-15k EGF after playing some 100 games. I give such players 4-3 stones handicap on 9x9.

meili_yinhua · January 30, 2021, 12:11pm

So you make some fair points in the rest of this post, and I don’t feel like I can contribute much more to the discussion of the main topic.

But, the secret to the 13k / 1150 “anchor”, is that it’s not the real anchor behind the scenes. That’s 1500 (somewhere mid-6k by the current conversion). the 1150 comes from a sort of compromise that came up when glicko-2 was first being implemented as people rated at 1500 would often complain about all the games against beginners as we could no longer choose ranks to begin with. So what it does is for provisionally rated players, it will display and matchmake as if your rating was your_rating - your_rating_deviation, while otherwise doing the math as normal for rating updates. For someone with no games they would – behind the scenes – have a rating of 1500 and an RD of 350, which would cause the displayed rating to be 1500 - 350 = 1150. This compromise was called “humble rank”, and the complaints seem to have disappeared since its implementation.

utah2 · January 30, 2021, 8:35pm

I am curious as to the reason for the floor and the ceiling. Why bother?

Why not let the rating system do its job? My preschooler may soon be playing 9x9 games online. If there is no floor (as I suggest), I see these advantages:

He will be able to find someone of similar strength to play, making a win 50/50 or so.
He will be able to track his improvement.
He won’t have to lose dozens of times with no chance to enjoy much winning and to no benefit. Winning is fun. Why do you make winning an experience you won’t let him enjoy?
If he hates being 45 kyu, I can comfort him and say that what matters is not where you are now, it is instead where you end up.

Similarly, why won’t you allow a 10 dan rank if a player—whether amateur, pro, or bot—earns it by being 1 stone stronger than the average 9 dan here?

Why pretend that no player is a stone stronger than our average 9 dans, and why pretend that no player is two stones or more weaker than our average 24 kyu? Let the rating system do its job to show what handicaps are likely to result in a 50/50 game.

Most people temporarily rated below 25 kyu assume they are likely to improve, and they’re generally right. Let them see their improvement over time.

I would make the lower limit 99 kyu, if you have to choose a number. You can announce that. Almost everyone will be better than 99 kyu. Maybe exactly everyone. I see no downside.

I would leave the upper limit undefined. You can announce that. Let the best players come here to demonstrate their excellence. I see no downside.

Zbingu · January 30, 2021, 9:14pm

The problem with very low ranks is that handicap is not meaningful past a certain point. If both players are blundering 10-15 stones without compensation semi-frequently, a 9 stone handicap won’t have any effect on the result. The player making the fewest massive mistakes will win with any reasonable handicap.