Pump up the rank (the wrong way): win by timeout

ckersch · August 15, 2018, 12:22am

FWIW, unlike Elo rating systems, Glicko doesn’t maintain a fixed average. If you computed all of the timeout losses as losses, but with a very low player uncertainty, you’d get a relatively small change in the rating of the timeouter without affecting the rating changes of the opponents.

JimBobBoy · August 16, 2018, 12:05am

Regarding “how to check if [time-out correspondence] games actually gave me a rank improvement” …
If you go to your “Profile” page (see red circle in attached image), you can hover your cursor at the end of your rank-versus-time plot, and it will show how many games affected you rank on a given day (where the red dot is). Doing this, I can usually tell if a '“time-out” game counted or not, and by moving my cursor to an earlier date I can see the change in my rank.

Vsotvep · August 16, 2018, 4:25am

Ah, I see the ambiguity in what I’m saying now. (''It should count for neither – No, it shouldn’t count for one side" versus “It should count for both – No, it shouln’t count for one side”). Sorry for being unclear.

However, to get back to the topic, I’m all for the option to let the player that didn’t time out decide whether the game should count as a victory or not. Although this gives sandbaggers a minor tool to prevent rank increase, it will mainly be useful to prevent people who play a lot of faster correspondence games to have a slightly skewed rank.

I think it’s a minor problem though. Out of my last 50 ranked games (all correspondence) I won by timeout 14 times.
One was annulled because it was the first move, three games counted, but I considered them finished enough to be in a state where my opponent could (should?) have resigned. The other games have probably been annulled in seven cases, but there remain three games that were in an early state and did not get annulled. So that’s an estimate of about 6% of total ranked correspondence games.

Lys · August 29, 2018, 10:49am

I downloaded the data from the link provided by @opuss (many thanks!) ad made the following charts.

Triangle mark pointing up: I won. Pointing down: I lost. The circle is the initial rank.
Size is deviation.
Colour is time: the paler, the older (game ended farther in time). The darker, the more recent.
X axis is rating difference between me and my opponent (positive when I’m stronger).
Y axis is rating variation after the game.

Here are all games:

We can see that oldest games had biggest rating variations and deviations. It makes sense.

First thing I noticed is that sometimes I won and the rating decreased, sometimes I lost and the rating increased. This doesn’t make sense to me.
Here are only those cases:

Eventually, here are only games ended by timeout. I won all of them.

Eugene · August 29, 2018, 11:03am

Are you sure you can tell the effect of individual games on your rank? That’s only possible if you finish no more than one game per day.

Would this explain those?

flovo · August 29, 2018, 11:34am

That’s because OGS update your “current” rating in batches of up to 15 games (or 30 days, whichever is first).

After each ranked game your “current” rating is updated, based on the “current” rating of all your opponents in the current batch and your “base” rating.

If a batch is full your “base” rating will be set equal your “current” rating and a new batch begins afterwards (containing 0 games before your next game ends)
A batch is full if it contains 15 games or the next game ends more than 30 days after the first game in the current batch.

Since your opponents play games too, their “current” rating gets updates as well. If their rating changes before your batch is full your rating can drop because of an earlier game in the current batch, even if you won your last game.

termination-api has an entry per game, possible many per day.
The rating graphs on the profile page have only 1 entry per day for reasons I don’t know.

Lys · August 29, 2018, 12:22pm

This is not visible in the termination-api: it only shows one row per game, so I assumed that the rating increase was the difference between two adjacent rows.

Really?
Are you saying that the “current” rating increment due to the first game in the batch is computed using just one game, while the increment due to - say - the tenth game in the batch is computed using the results of ten games? And also that my “current” rating isn’t involved in any way in these calculations, except for storing the results (provisionally)?

I find this very confusing.

flovo · August 29, 2018, 1:59pm

I know.
As far as I know, there’s no documentation about the termination-api.

Yes, that’s correct. And the now current rating of this ten opponents.

Yes.
But OGS uses you current rating everywhere, except for updating your own rating.

By the way, the rating system OGS uses is Glicko2

flovo · August 29, 2018, 3:18pm

I’ve plotted your rating history with and without the correspondence timeouts (15 games).

The black line are the ratings from the termination-api. The green line is the same, but calculated with my reimplementation of the OGS rating system.
The red line is your would be rating history, calculated without the timeouts in correspondence games.

Below I plotted the difference of both ratings.

Vertical dotted lines are the times, at which I removed a timeout game.

Please not, this is only an estimation, but I think it is rather accurate.

Lys · August 29, 2018, 4:05pm

Funny how sometimes the red line is above the others (isn’t it weird?) and how sometimes it reduces its distance. I can’t understand why.
I would expect that the red line would run more and more below the others.
What’s the cause of these realignments?

In the end, the difference is less than I expected.
It also seems that consequencies of abnormal results can be fixed after a period of good results.
That’s heartening.

flovo · August 29, 2018, 4:28pm

This could have something to do with the now shifted rating periods. When OGS starts a new batch, the deviation increases from 60 to 80 points (this is visible over the termination-api). If the games are now near the beginning of such a rating batch, their short time influence on your rating is bigger there.

The rating system estimates a win probability for each game. Improbable results (win against stronger / loose against weaker) are changing your rating more than expected results. If your rating is higher than it should be, than your loses are drag your rating more down, than your wins.

I would estimate, that after about 30-40 games (no matter how far your rating was off) your rating should be again at the value it should be.

opuss · August 29, 2018, 6:20pm

Interesting. Would a rolling batch be any better? Old games could be removed when a new game is added.

flovo · August 30, 2018, 10:31am

I don’t think this would change much.

You would get something like 15 more or less independent ratings. (You now have a tally of 15 base ratings as well). And you still have to remove games older than 30 days.

And by the way, how do you define “better” in terms of rating (and deviation) over time? I want to be sure, that we both talk about the same.

flovo · August 30, 2018, 11:28am

That’s the difference:

I’m still interested in what your expectations are for “better” rating history.
I want to compare my own expectations with others.

opuss · August 30, 2018, 1:18pm

I thought that it might be possible to eliminate some of the quirks. In this case, the jumps in rating deviation at the start of a new batch seem somewhat artificial.

A rolling batch still won’t prevent the rating from occasionally dropping after winning a game. This is probably due to the current ratings of opponents being used. After playing one 19x19 game I even saw my 19x19 rating drop but my overall rating rise.

Having said that, I think that the current ratings are generally good and I would like to see them used more. An option to create a custom game setting an upper and lower limit on the 19x19 live rating would be useful.

Lys · August 30, 2018, 2:59pm

@flovo: could you tell me also why sometimes my rating seems to change in between one game and another? Is there some calculation that doesn’t depend on ended games ?

I can’t prove it, but I had impression that in last days there was some fluctuation in my rating (something like a 0.2 k) not related to the games. I don’t think it could be the case of the 30 days limit either.

flovo · August 30, 2018, 3:29pm

I don’t know if OGS recalculates your rating between games.
Since the termination-api gives one entry with your initial rating, but no other entries independent of finished games (or outcome -1), I don’t think, that recalculations happen in between.

What I’ve observed is, that the rating shown on the Home page is cached. It doesn’t get updated after you ended a game. You have to close the tab or restart your browser to get it updated.

Please note that I don’t know how the OGS backend works and I cannot look at its code. Neither I had a look at the open source part of OGS.
I gained the knowledge of the OGS rating system by reading the introduction post of it in this forum OGS has a new Glicko-2 based rating system! [2017] and the implementation guidelines of the inventor of Glicko2 http://glicko.net/glicko/glicko2.pdf
With this information I’ve written a script that can reproduce ratings similar to the ones in the termination-api.
(I’m able to reproduce ratings within a small error margin to the OGS ones). I still don’t know everything about how OGS Glicko implementation works.

Lys · August 31, 2018, 8:32am

I agree that Glicko algorithm should not. I wonder if timeouts are someway involved.
I noticed that the list of games retrieved by termination-api doesn’t contain some games ended by timeout that are visible in the profile page under game history. So I wonder when these games come back in the batch, if they do, and how rating is recalculated.
Perhaps this could be the reason of rating readjustments between games.

I know. I often check rating variations after I end a game and I have to load the “home” page and then hit the “refresh” button on my browser in order to see new rating.

You did a great job!

Nobody does, except obviously for the devs, that are SO shy about this topic.
Many threads in the forum talk and discuss about rating calculations and usually get answers based on “maybe”, “I think that”, “I works well enough” from people who don’t really know what actually happens.
I think your contribution is the best available at the moment. Thank you!

What I really loved from your check on my data is that the algorithm seems to be really self adaptive against abnormal data: a few timeout games can introduce some distortion, but after some other game that distortion is reduced and annuled. This actually answers the question I raised in the original post:

They did, but only temporarily.

So:

Neither of those is necessary: just go on playing, and the rating will adjust itself.
This could become an issue only if wins by timeout are frequent and recurrent.
In that case, I believe that asking for annulment of few of them, the most awkward, should be enough.

Anoek already said so:

but sometimes, you know, we don’t trust much shy people, fearing that they are hiding something. My fault. I’m very happy that your script helped me understanding that.

flovo · September 4, 2018, 8:13am

Games annulled due to mass timeout are not rated, but are missing the annulled flag.

I just observed a rise of my rating from 1302 to 1320 points, after a game ended by timeout and was annulled (the raise appeared some time after, not immediately. had to restart my browser some times, and observed no change the first few times). Short time after, the rating dropped back to 1302.

Screenshot_20180904-091400
Screenshot_20180904-092245

Looking at my rating history, the 1320 rating was my rating before 1302.
I would guess there is a minor bug when a game gets annulled, it causes some parts of the backend to drop the last rating and using the 2nd last one (api/v1, maybe rating system as well). Short time after, it recalculates your rating and we are back to the up to date value.

I just looked it up in the game details, it seems the rating got recalculated:

             rating                 deviation              volatility
before: 1302.1103515625         89.20301818847656     0.06296343356370926
after:  1302.1103496314377      89.20301607348608     0.06296343302576082

No big change in my case, but it’s not a precision issue.
The annulled game is not in the rating pool, it just caused the recalculation!

I wonder which value is now in the termination-api

A true Glicko2 implementation wouldn’t care (Glicko2 use pre rating period rating for player and opponent as well)
OGS introduces a feedback mechanism (the current rating of your opponent got an update after that game, and now the rating system uses this rating in its calculation). This shouldn’t be an issue (except for some theoretical edge cases)

system · December 4, 2018, 4:13pm

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.