Why does "restrict rank" allow ? players?

dragon-devourer · October 3, 2021, 6:14pm

I know the feeling mate! Haha!

dragon-devourer · October 3, 2021, 6:23pm

I don’t know. Seems Ok to me. Rating = strength. Deviation = confidence (or, rather, inverse of confidence). Volatility = consistency. If you are improving, hitting a slump, or playing erratically (including rank manipulators BTW), then volatility will be relatively high. On the other hand, if you have had the same true strength for a while, then volatility will be relatively low.

Edit: I need to read more carefully:

So, are you saying that, in theory, it’s a nice idea but it doesn’t actually work in practice?

Vsotvep · October 3, 2021, 6:28pm

Agreed, but then I expect the volatility of each of these players to drop, since they are consistently playing at a certain strength. Nevertheless, the volatility of the weakest and of the strongest players tend to increase over time, only those in the middle seem to decrease.

I did some test as well, where I gave one of the players a sudden improvement half-way through the total number of games, and this didn’t seem to have a lot of impact.

And eventually, the volatility, even with different tau values, does not seem to have a large impact; even after 20000 games played by one of the players, it keeps values in the range of 0.06 +/- 0.005. In some tests it even was the thing that broke down the whole ranking (by getting too large it seems to make thing unstable).

One reason I could think of, is that my win-chance calculation is not realistic. I just chose the arctan as a basis, since it seems roughly what I would expect, but there’s no reason why this would be realistic.

I might of course have implemented it incorrectly, but I did check it about 3 or 4 times.

Anyways, I’ll upload the code later, then someone else could have a look (I’ve got to cook dinner now)

shinuito · October 3, 2021, 8:10pm

Maybe you need to do something volatile For example what if a player quite regularly can play at a say 2000 rating but has periods of much lower levels of play (be that because they play in the middle of the night, or while have a beer, or just play tonnes of simultaneous games at once or consecutive games, or they’re 2000 in live, but much weaker/stronger in Blitz or different board sizes etc). Then does the volatility parameter change much? I mean just by the name of it it sounds like it should no?

I feel like you happened to just model players that actually play at their level mostly by how they tend to randomly win a consistent fraction of the time for example (except possibly the sandbagger simulation one, that I don’t think I fully understand the details of – one player is continually having their rank reset to simulate a bunch of new players in the pool rather than actually adding new players?)

It seems you did try these to some extent already. Maybe it needs to be more drastic though, have a player that plays at 2000, then plays consistently at 1800, and then swapping back and forth. One could even add a third point of like 2100 or something. ( I gave some justifications earlier, but rationalise it with different board sizes or time settings if need be )

If that doesn’t up the volatility parameter I have no intuition what will

Can’t you just take the win chance like the expected values in Elo for example? On wikipedia I’m seeing some formula like

If we don’t care about inputting multiple games in a window, like the old system, would this have an interpretation as a win-chance?

flovo · October 3, 2021, 8:34pm

The win chance in glicko is calculated the same way as for ELO.

The glicko fomula do not remember the past. Everything happened before the current “rating period” is boiled down into rating and deviation. The past volatility isn’t even used when updating the rating.

In single game rating period glicko, the algorithm hasn’t even a chance to see if a player suddenly improved. A win against a stronger opponent could either be a sign for the player being stronger than expected, or just a fluke of chance in a series of losses against stronger opponents.

shinuito · October 3, 2021, 8:44pm

I feel like that isn’t exactly true though. While they don’t explicitly remember the past in that previous states don’t enter directly into the model at the current states, I don’t know if I believe that the model doesn’t remember the past.

One could take a fibonnaci sequence for example f_{i+2}=f_{i+1}+f_i and then look at the tenth term with starting values f_0=1 f_1=1. Say f_10=f_9+f_8=55+34=89. One could say that the Fibonacci sequence doesn’t remember its past because it only cares about the previous two terms to compute the next term, but of course in this very simple example we could work backwards and work out the past states, so not all information is lost.

Obviously the glicko system is more complex, and I agree that the ratings update step doesn’t care how it arrived at the state, just what the current state is. Still if one wants to understand the progression of the volatility parameter, one has to come up with a sequence of states that might lead to the desired behaviour.

Uberdude · October 3, 2021, 9:10pm

Of potential interest about how you can exploit volatility to boost you rank:

Vsotvep · October 3, 2021, 9:51pm

This is exactly what I observed in my tests: the strongest and weakest players increase volatility, because they occasionally win / lose unexpectedly from each other, which on average seems to have the effect of increasing their volatility. If I set tau sufficiently high, and simulate enough games, the result is that at some point volatility becomes so large that a single unexpected win makes a player win thousands of rating points in one go.

I think one way to solve the problem is to simply bound the volatility, such that it cannot grow to absurd height.

flovo · October 3, 2021, 10:00pm

OGS already does this. Volatility is bound to never be bigger than 0.15. On OGS, one can find this mostly for the lowest rated bots (like amybot).

Vsotvep · October 3, 2021, 10:02pm

Good, then we don’t have to worry about that.

Conrad_Melville · October 4, 2021, 12:17am

Sorry to disappoint you, but I doubt it would have much effect on alt sandbaggers. Two reasons. First, most violations are never reported. One would think that score cheating would surely get reported almost always, and yet, in my experience, I estimate that at least two-thirds of score-cheated games never get reported (one can see this in the record when the cheater finally gets busted). Second, judging any sandbagging, alt or rank manipulation, on the basis of play, is difficult and uncertain, and my impression is that most mods will rightly hesitate to do it (I never did). Traditional methods of detecting alt sandbagging are still the best. Reports such as you describe would provide leads, but leads are already commonplace. The time to deal with the crush of cases is what is lacking.

Lest someone misunderstand me again, I must emphasize that I do not consider sandbagging an important consideration when deciding what the pseudo-rank for provisional players should be. I do agree that 6k is much too high and is causing huge problems. On the other hand, let’s remember that moderators overburdened with having to make rank adjustment for players who self-selected their rank was said to be a major reason (if not the main reason) for the adoption of the current pseudo-rank system.

BHydden · October 4, 2021, 12:21am

I’d say even this number is conservative… our community is ridiculously tolerant of being cheated against

I believe anoek also agrees with this. The entry point / humble rank was never adjusted after the latest rating updates. anoek wanted to let things settle first so that a good decision could be made about where to move it once, rather than moving it over and over again.

Yes, you could say that time has come and past… however, anoek is busy with a great many things and will get to it when he can

Conrad_Melville · October 4, 2021, 12:25am

I agree completely, but when I talk abut it I don’t want to stretch people’s credulity. I had many experiences in which I found 5 or 6 unreported, cheated games in a player’s history.

BHydden · October 4, 2021, 12:26am

yep same. probably true for most mods

dragon-devourer · October 4, 2021, 9:10am

OK, fair enough But the other benefits:

and lack of previously cited disadvantages:

still stand though

Under the system of declared initial rank I am proposing, this would not be an issue

Current system:

Beginner (true strength = 30k) is assigned initial rank 6k, gets auto-matched against ~6k, promptly loses, complains to mods / forums, answer is “play more games and it will adjust automatically”, beginner doesn’t want to have to lose 5-6 times before getting fair matches so leaves OGS

New system:

Beginner (true strength = 30k) correctly declares initial rank 30k, gets auto-matched against ~30k, fair matches follow, everyone is happy, no complaints. It is fair to assume that most new accounts will manage to declare the correct initial rank provided the accompanying guidance information is clear and simple (and right there)
Beginner (true strength = 30k) incorrectly declares initial rank 9d, gets auto-matched against ~9d, promptly loses, complains to mods / forums asking to change declared rank answer is “abandon account and make a new one with correct declared rank” (or just “play more games and it will adjust automatically” as before), no need for manual intervention. This will be a minority of new accounts anyway

Indeed, mods should never manually change anyone’s rating - just let the system do that automatically. Important point: Declared initial rank is just a short-cut to bypass the current [?] stage where the rating behind the [?] is way off. If someone puts in the wrong value for the short-cut, that’s just tough - now you made it a long-cut! You should have read the guidance notes that were right there next to the bit where you choose rank (we did remember to add some guidance information right there next to the bit where you choose rank, right? ).

Mods should not, will not (maybe even cannot) manually change rating. The only way is to play more games or make a new account.

Groin · October 4, 2021, 9:14am

Beginners can just say i am a beginner, and the system assign 30k, not themselves. No need to explain that 30k is not 1k (or 1d, or 9d … )

dragon-devourer · October 4, 2021, 10:39am

Exactly! To quote myself (from Glicko-2 does not require accounts to start at 12 kyu. So why do we do that? - #62 by dragon-devourer):

Glicko-2 does not require accounts to start at 12 kyu. So why do we do that?

It could even be a couple of questions on the registration page. Something like this:

Ranks allow matching of player abilities for fair games. Most new accounts start with a provisional rank that shows as [?]. This should settle on an appropriate value automatically after a few games (5-6 on average). The questions below may help to reduce the number of games necessary to get a confirmed rank.

Do you know your rank? Yes / no

If yes:

Enter your rank here: [rank drop down] [association / server drop down]

Enter link to your association / server profile: … (This will allow moderators to confirm your rank and skip the [?] stage completely)

If no:

Select your level:

Beginner

Novice

Intermediate

Advanced

Don’t know

(the above would map to, say, 30 kyu, 20 kyu, 10 kyu, 1 Dan and don’t know reverts to current behaviour, i.e. 6 kyu)

So chances of mistakes with declared rank are small. And even if there are mistakes, there’s no need for mods to manually change rating. Therefore, no burden on mods due to declared rank

flovo · October 4, 2021, 11:44am

I don’t know how you come from increased rating drift to stability.
We are probably in this scenario:

since data from old OGS suggests most players chose the lowest rank

while the average rating of new accounts seems to be closer to the 1500 mark

If we just adjust the rating->rank formula to account for the drift, inactive accounts will increase in rank over time. In this case my question would be: for how long do I’ve to stop playing go to become 9d.

Vsotvep · October 4, 2021, 11:56am

No, we’ve never been and hopefully never will be in that scenario. If people will on average choose the lowest rank possible more often, we will be in the following scenario as long as we fix this lowest rank at a certain rating:

The important thing to note, is that if sandbagging is indeed extreme, then the lowest possible rank to choose from will lie somewhere in the SDK rank. If on the other hand airbagging is extreme, then the highest possible rank to choose from will lie somewhere in the SDK ranks.

I don’t think sandbagging nor airbagging will be that extreme, but I can see the scenario where sandbagging is happening relatively more often than people choosing the correct rank or airbagging, resulting in the lowest choosable rating being around 15k or so, instead of 25k.

This shouldn’t be adjusted for, since that adjustment will be what causes drift. Allowing people to choose accounts at fixed points will initially cause some drift when there is more sandbagging on average or more airbagging on average, and this initial drift will be worse or better depending on how extreme the sandbagging / airbagging on average is. But, what my data shows, is that this initial drift will disappear over time.

(Of course we can adjust for how the rating correlates to kyu / dan appropriately: that’s just a label we put on top of it and has no effect on Glicko)

shinuito · October 4, 2021, 12:16pm

I don’t think it’s just a label though because the kyu/dan correlates to the ratings, when handicap games are played.

Isn’t there some adjusted calculation of ratings done when handicap is played, so that the game can be treated as even-ish (from a rating point of view) when the ranks are different and there’s handicaps.

(Not to disregard the rest of the post - but just highlight something I think I can comment on )

Edit:

What I mean is the kyu/dan labelling system directly impacts the rating calculation for handicap games. Or maybe one could translate the the kyu/dan into rating windows/anchors and ignore the labelling. I guess I’m not really sure.