2020 Rating and rank tweaks and analysis

I can already imagine all the complaints about “it says that this guy has been a dan and now I have to give them handicap? TF?”

1 Like

The information is already there in the graphs on the profile page. This would just make the maximum rating / rank information more salient and easily accessible. GoQuest uses the maximum rank attained as the display rank and I quite like the idea, because it cuts out all the fluctuation, and only changes when you reach a higher rank to that previously attained. This is the other aspect of the idea that I like: when the maximum rank attained goes up it offers a concrete measure of progress.

Nah, considering how OGS rank fluctuates and how it can be easily cheated, it would be some silly number probably achieved by using imperfections in OGS ranking. So it’s not a concrete measure of progress in any way.

When we have a weird rankings to begin with putting even more importance on top of it won’t make anything easier.

I wouldn’t mind separate progress ranking but it should be something more solid and less dependent on rank fluctuations. For example, winning 15 out of last 30 games against this rank.

4 Likes

I guess it sounds similar to the fox type rank increases, although I don’t know the specifics of that. Does it not care who you challenged to rank up, once it’s a rated game or is it within some rank range etc?

Maybe more of a hybrid of the two, as in with GoQuest you can’t rank down but you have to perform well against a certain rating in x recent games to say you’re at least this rating (converted to rank)

2 Likes

True, probably the raw highest rank reached wouldn’t make sense. But it wouldn’t be hard to produce a more reasonable measure, for instance the highest rank sustained for x games.

I know it’s just one more lowly voice, but it is very demotivating to win a game against a stronger player and have you rank go down as a result :slight_smile: Chalk up one more vote that this is a behavior that’s less likely to keep people involved and playing.

4 Likes

Could anyone explain this for me? How does this metric shows how well the system is doing? Winrates are averages and they can be close even when individual predictions are completely wrong.

I was expecting to see there some kind of binary classification measure? Even basic “correctly classified / total samples”.

1 Like

Well, to me it seems fairly standard from the point of view of “Well, this is roughly how far it spreads from the expected score on average

The problem with “correctly labelled / all samples” is that it doesn’t take into account that the system isn’t trying to get correctly labelled samples, it’s trying to move the player to find an estimated rank that’s as close to that player’s true rank as possible. Because of this, a major upset (such as a 13k beating a 1d) should prove a bigger source of error than a minor upset (such as a weak 13k beating a strong 13k). I’m not 100% behind the measure that they used, but it’s not completely worthless.

Now my gut (naive) reaction would be to do something like for each win add (1-Expectation), and each loss add Expectation (and then probably divide by all data points for a contractable number), that way you’re getting the error of each game.

However,

this is a problem that has been considered by people who do ratings systems, notably a competition held by FIDE (Deloitte/FIDE Chess Rating Challenge | Kaggle) which Glickman himself not only competed in (and came in fifth with his Glicko-boost algorithm), but as the chairman of the US Chess ratings committee suggested the measure by which candidates would be measured, of which is described in the long link above and now here below:

The evaluation function used for scoring submissions will be the “Binomial Deviance” (or “Log Likelihood”) statistic suggested by Mark Glickman, namely the mean of:

-[Y LOG10(E) + (1-Y) LOG10(1-E)]

per game, where Y is the game outcome (0.0 for a black win, 0.5 for a draw, or 1.0 for a white win) and E is the expected/predicted score for White, and LOG10() is the base-10 logarithm function.

There isn’t a full explanation for the use of it, but I would imagine its resemblance to the Binary Entropy function is largely the proponent, being a measure of how surprising a result would be from a binary outcome on some probability.

3 Likes

This is standard for binary classification task, right. If I saw that, I wouldn’t be surprised at all. So that’s why I’m asking.

2 Likes