I think the 13k default rank is doing harm [Closed]

DVbS78rkR7NVe · September 30, 2018, 9:26pm

Actually spike is OGS devs’ achievement. OGS v5 was released in Fed, 2017, and as a result number of new accounts dropped, but wait! Number of new accounts who played at least one game stayed the same! Maybe somehow new version makes it harder to register throwaway accounts?
And I don’t know why orange line spiked, maybe OGS got better?

And downturn on Glicko release looks significant. Blue line would stay the same, naturally, because Glicko doesn’t change willingness to play the first game. However orange line dropped from 0.5 to 0.4 (a bit less actually), so it’s 1 out of 10 people (willing to play 1 game) more quit before hitting 5 games.

flovo · September 30, 2018, 9:53pm

@anoek thank you for your hard work.

Assuming most new players are new at go as well, this would stabilize the rating of new players around 1500. Glicko separates players only relative to each other. A supposed 25k would win about 50% of her games against her fellow 25ks, leading Glicko to decide that both players are rated correct, leafing both at 1500, lowering only their deviation.

Maybe it’s mean. It’s neither median nor mode. But at least over time there would be rank deflation if one changes the entry point for all new players. To balance 3 entry points in a way that prevents inflation/deflation would be a real hassle.

The Glicko2 specification fixes the entry point to one value (not talking about guessing initial ranks of new players)

(a) If the player is unrated, set the rating to 1500 and the RD to 350. Set the player’s volatility to 0.06 (this value depends on the particular application).
(b) Otherwise, use the player’s most recent rating, RD, and volatility σ.

@smurph
I cannot recalculate the whole rating pool. I don’t get the ratings of GnuGo and Master Mantis stable. At some point in time their rating starts to skyrocket. They played to many games to just ignore them.

And I don’t understand what you want me to do anyways. Change the initial rating of some players and see what happens?

Conrad_Melville · September 30, 2018, 9:59pm

An excellent point. I withdraw my suggestion.

smurph · September 30, 2018, 10:03pm

Nevermind the orange line, what’s up with green?

Also, a point I’ve made elsewhere about a similar question of proportions, even if (I can’t tell from the squiggly lines) the ratio of 5+ gamers to 1-gamers decreased, the fraction of 5+ gamers still increases over time. All this (if true - again, hard to tell from the squiggly lines) would tell us is that the increase in 0/1-gamers does not translate to a proportional increase in 5+ gamers.

My guess would be that those 5ers would have played more than 1 game regardless of other factors (perhaps they’re repeat customers? what is the average win/loss for each category?), whereas 0/1ers might belong to a different population.

@flovo
Nah, I don’t mean the actual data, just mock data (with plausible skill distribution). If I knew how to implement Glicko in a simulation I would do so myself, but alas. If a systematic measure like letting people choose their matchmaking rank has a systematic effect, we should see it regardless of whether the data is real or fictitious. I suppose the best measurement of impact would be to have the mock pool play some games and see which method produces the most accurate predictions of who wins.

Jokes_Aside · September 30, 2018, 10:33pm

Balancing is not necessary. Pool of active players constantly changes, meaning the effective average rank of active players undergoes constant fluctuation no-one tries to counter. There’s no reason to believe that providing different entry points would impact the active rank pool in more significant way than inactive players simply refusing to play anymore.

It does so for simplicity of argument. There’s nothing particularly magical about the value 1500 in that document. The model doesn’t break if different starting points are used. You can find implementations of Glicko-2 algorithm which allow custom initial placement.

trohde · September 30, 2018, 11:21pm

@anoek:

What about making it an option then? I’m sure that a lot of new members who already have a clue (e.g. about rank), no matter how little, would happily solve a few puzzles in order to get a more fitting rank.

And those who think, “ah no, I don’t have a clue” could click a “clueless” radio button and be assigned 25k or something?

meili_yinhua · October 1, 2018, 12:27am

I’ve just gotta say that even Glicko-2 doesn’t do this. In fact Glicko-2 is not a zero-sum rating system like ELO is, therefore the average rating of all rated players can (and often does) shift up and down as time progresses. The single-point average of Glicko is nothing more than an arbitrary point which is a relic of ELO ratings.

However, it has been shown to the devs to be more accurate (at least on a pure % wins standpoint) than not doing so, so they chose to follow Glickman’s original implementation in order to follow that.

Although their implementation of the ratings period is not 100% perfect
as shown by this quote from flovo where he modified some of OGS’s rating period system to more accurately correct for the current ratings (although he only took his own games and the test is inconclusive):

Flovo's analysis

Rank Instability on OGS

I’ve first estimations.
I used the opponent ratings as provided by OGS (no recalculation of their rating history).
I only recalculated my rating history with OGS like Glicko2 (again with opponent ratings provided by OGS).

rms is calculated over (outcome - win_probability)
direct % as 1 - Σ(outcome XOR player_rating > opponent_rating) / number_of_games
(outcome = 1 if player win, and outcome = 0 if opponent_win

Prediction quality of my rating history: (rms: lower is better, direct %: bigger is better)
Player id: 449941
Games: 978

algorithm rms direct %

OGS 0.4645 64.93%

Glicko2 0.4637 65.95%

There is no difference in predictability.

To get the right values for Glicko2, I would have to recalculate the histories for the whole player base. At the moment I’m not able to do that (rating history gets cropped to max 5000 games).

EDIT: Forgot to adjust ratings for handicap games in the calculation of the direct %. Now both are slightly higher.

It is worth mentioning that it may be more accurate to have people seeded in at the overall site’s average (or maybe even just the average for players with a low enough deviation), but that would probably be for little gain (and might even be for a loss).

smurph · October 1, 2018, 12:55am

I personally like the idea of giving new players the opportunity to “take a quiz” (because a quiz isn’t as intimidating as a test ;D ). I think it would make for a great button on top of the HOME or PLAY page.

It would be even more useful if there was a tutorial that explained the ins and outs of OGS. I know there’s something like that, but let’s be honest. Most of the time, the people who are curious enough to dig through that stuff or carefully read all available information on this forum are not the people who ask questions.

Thad said, we can approach this logically and break down the new accounts into the following groups:

Alter egos. Familiar with the game, familiar with the website. They need no introduction.
Expats. Familiar with the game, unfamiliar with the website. They could use a brief introduction to how OGS works including information on the rating system, how to get a game and how moderation works.
Beginners. Unfamiliar with the game, unfamiliar with the website. They could use an elaborate introduction to the game ( http://playgo.to/iwtg/en really has no peer even to this day) and a brief introduction to the website, same as Expats.
The obsCure. Unfamiliar with the game, but familiar with OGS.

Now the question is who exactly do we want to cater to? It’s not [1] for obvious reasons, it isn’t [4] because they probably don’t exist and [2] will figure it out on their own, mostly because they know where they should be rankwise and the rest is just a matter of tenacity.

The only group that would benefit from something like this is absolute beginners (I’m not even sure it’s that large a group) and as far as I’m concerned, testing for just how much of an absolute beginner people are wouldn’t help much with matchmaking (they will lose most of their games either way) nor with keeping them around.

My guess is… if people want to stick around, they just will.

ckersch · October 1, 2018, 1:09am

I wonder if just adding “I am a beginner” as a checkbox option while starting a new account would help things. Players who aren’t beginners could reasonably start at 13k and get to their real ranking within a few games. Most players will probably be in the 7k to 17k range, which won’t require long to stabilize, and having a dual entry point system will keep beginners from entering as Dan-ranked players and messing with that part of the game pool.

For beginners, they could be given a Glicko rating of 1500, as per usual, but a ranking of 25k, (as has been suggested by several other people), which would start them at the bottom of the rating pool for actual games. That way, beginners could play other beginners, without needing to get beaten down from 13k to 25k, first. Taking all of the true beginners out of the 13k pool would at least somewhat lessen the instability at that rank, as well.

Some players will sandbag and start themselves off as beginners instead of 13ks, but as they’ll have high uncertainty and an actual rating of 1500, they’ll only be there for a few games. Regardless, it seems likely than fewer people will be mis-rated through intentional sandbagging than are through the 13k default rating.

Finally, I do think it would be a good idea to preferentially auto-match new accounts with other new accounts. Doing so would at least provide some information as to where new accounts belong, before they get dropped into the general rating pool.

Maharani · October 1, 2018, 4:20am

Just dropping in to say that I still like the idea of humble rank, and it’d have my vote in a poll

Eugene · October 1, 2018, 4:27am

Me too. I don’t understand @anoek’s “refutation” of it.

I don’t see how it has any consequences on the rating system, since it doesn’t change it at all.

It is not the case that all it does is move the problem down. It moves it away from people’s who’s ratings are known with good certainty. It contains it to only those people who we don’t know their rating. That’s actually one of the many beauties of it.

While it is true that a new, uncertain, dan could be paired with a new uncertain TPK, the dan won’t be there for long, while the TPK will stay where they fit.

GaJ

anoek · October 1, 2018, 4:42pm

Here’s my understanding of the humble rank proposal: When a player’s rating deviation is high, pretend like the player’s rank is at the bottom of the deviation band for the purposes of matchmaking and display.

While trying to expound upon my initial refutation, I think I might have talked myself into thinking it’s probably a pretty good idea again.

The thing I like about the idea is it should be a lot more friendly for beginners as they will be matched with other beginners more often. It’ll probably take a few more games for dan’s to get into their appropriate rank, but it shouldn’t be that bad since glicko will accelerate their placement the more games they win in a row.

The thing I fear is that it’ll mess with the rating curve more than just a little bit. I suspect that it’ll become hard for legitimate 20-15k’ers to not be buoyed up in rank continuously, which in turn will buoy everyone above them. This kinda happens now anyways with matching people at 13k, however the effects will be more pronounced because of the increased rating gap between a 13k and a 20k (that is to say, the 20k will win more points from a victory with a 13k than a 13k v 13k). This effect will be compounded by the fact that the number of rating points between ranks is lower near 20k than 13k, so ranks are going to jump around a lot more.

I’ll need to run some simulations, which is what I said awhile ago, but I’m in a better place with other code to actually look at this now I’ll see if I can get some progress made on that front this week.

smurph · October 1, 2018, 4:51pm

Thanks for your continued hard work.

And even though high ddk might experience more volatile ratings, I think that’s still a pretty good reflection of what happens at those beginner ranks - a lot of fluctuation, because their games are close to random anyway (and I say this with all the love for newbies).

ckersch · October 1, 2018, 6:27pm

My thoughts on the humble rank:
-High DDKs (i.e. around 20k) will end up playing a lot more strong players, since that will be the entry point for everyone.
-Everyone will decrease in rank by 1-2 stones, since that’s about as low as uncertainty gets for anyone. (Fuego, with tons of games, is at +/- 1.4). This isn’t the end of the world, since everything would be internally consistent, but OGS ranks would be out of line with everything else, with OGS 1d corresponding to something between 2d and 6d for other servers and national rating systems.
-This probably won’t inflate any ranks. Initial wins vs. absolute beginners (<5 games) don’t give a large rating boost, so these people losing to 20ks instead of 13ks won’t make much of a difference to ranks. (…although new players have a rating of 1500, so these will be wins against stronger, if uncertain, players for the 20ks. Not sure where that leaves us for how much impact these games will have). On the other hand, strong players will have a lower deviation once they get up to 13k, so they’ll take more rating away from DDKs on their way up the ranks. I suspect the number of games played by reasonably strong (SDK+) players on new accounts is fairly low compared to the total number of games being played, so this probably won’t strongly impact ranks.

If we want to implement the humble rank, but only really have it affect high uncertainty accounts, we could alternately compute rank by saying rank = mean rank - std. dev + minimum mean rank (~1.4?) This would eliminate the second issue, but not the first.

If we wanted to potentially eliminate both issues, we could add a “humble rank, but only for beginners” option. For this, we’d need to add an option to check “I am a beginner” during account creation. Beginners would be ranked based on humble rank + 2 or so, and would revert to a normal rank after their uncertainty dropped to 2 or below, at which point they’d drop into the main rating system, without having their rank jump at all. Ideally, this would mean that strong players would enter the system at a rank of 13k, weak players would enter at 20+k, and both would have the same rating, so as to not mess with Glicko. Weaker players could avoid getting beaten down by 13ks, and fewer 13ks would get matched with ? accounts, hopefully reducing uncertainty at that rank, somewhat.

On a related note, based on the data that we have, can we plot the actual average ranks of people entering the system? My suspicion, if we look at the first 5 games that people will play and compute a Glicko based on those games, is that it’ll be a bimodal distribution, representing a peak of true beginners at TPK somewhere, and a second peak of players with accounts elsewhere, likely closer to 13k. For the “elsewhere” players, we could potentially add a second humblerank-like adjustment to represent their most likely rank, which would similarly be dropped as their rank uncertainty fell.

Likewise, if we know the average ratings of true beginners, we could add a rank adjustment of (stDev-2)*(initialHumbleRank-trueBeginnerRank), assuming a threshold uncertainty of 2 for where players would re-enter the rating system. (initialHumbleRank would be 13k-newAccountUncertainty and trueBeginnerRank would be the actual mean rating of players checking the “I am a beginner” box).

anoek · October 1, 2018, 7:46pm

The way I understood it (and was thinking about it), is that the humble rank would only apply when your deviation is high, so we’d interpolate between their humble rank and their current average / expected rank based on deviation (off the cuff i’d say if your deviation is ± 150, then we display your expected rank, and anything lower we interpolate between your humble rank and that rank so it’s a smooth path to your expected rank).

I believe it will because the number of points entering the system is effectively higher since you will likely have 20k accounts beating new 13k accounts a lot more, so the reward given to the victor will be higher. The points will be balanced and lost from the 13k accounts, but there’s a high turnover rate for beginners, some just play one game and decide to go play chess or whatever - that sort of thing, so i think the “balance” will be a bit skewed, hence I think there might be some inflation issues. However if we can confirm this and figure out how it’s affecting things, we should be able to compensate for it too, theoretically.

Definitely something I’ll be looking into.

smurph · October 1, 2018, 8:21pm

Would it avoid confusion if we kept new accounts’ rank displayed as [?] like KGS (yet still display proper rating/deviation on the profile) and have the rest happen behind the scenes?

DVbS78rkR7NVe · October 1, 2018, 9:36pm

I wonder how much rank inflates with the current system.

Eugene · October 1, 2018, 10:04pm

The actual proposal was that display rank = rating-rank minus personal uncertainty plus average(*) uncertainty.

The plus bit means that most people’s display/matching rank will hardly change at all, because their uncertainty is close to average. Only uncertain people will thus have a different display/matching rank to their current.

(I see that ckersh88 re-proposed this in his reply also)

GaJ

I use the word “average” loosely here. This component is purely a vanity thing, to stop people’s display rank changing, so it doesn’t matter a lot about what it is, as long as it achieves that. I suspect that median is best answer for this purpose, to achieve the least amount of perceived change. But it could easily just be a static guess number like “1.5” to put things in the ballpark, or even a “site calibration” number, like “3” which we adjust from time to time if we think our ranks don’t match other sites well enough.

DVbS78rkR7NVe · October 2, 2018, 5:45pm

Hmmm, I wonder why is that

meili_yinhua · October 2, 2018, 6:18pm

There is, of course, the caveat that playing against players with high RDs affect your ratings less than more established players, so there’s that.