Rating system issues

Chinitsu · July 13, 2020, 2:01pm

“Player Chinitsu successfully learned to accept defeat with grace. Earned +2 EXP”

kingkaio · July 13, 2020, 3:05pm

Wins against someone 20 ranks ahead of me.

NO!! I’m gonna loose rank in 15 games because of the rating system

meili_yinhua · July 15, 2020, 11:25pm

I can’t figure out exactly how to use the API to check for the life of me, but using the information I did have before and after one game about the last 15, the RD only makes sense with a volatility of ~0.36, which corresponds to an ~62 Elo/sqrt(period), which corresponds to my RD consistently being in the 60s and 70s.

DVbS78rkR7NVe · July 15, 2020, 11:32pm

Using API is very simple. You take the id number from your profile page meili yinhua like this here is 165439 and you put it in url of this structure:
https://online-go.com/termination-api/player/165439/glicko2-history

7th column is volatility.

It can be tricky to remember but my browser now knows what I want and I only need to enter “ter” for it to suggest me the link:

meili_yinhua · July 15, 2020, 11:36pm

mkay, so I see my volatility is ~0.06, but my calculations suggest it functions as if it were a much bigger number…

KillerDucky · July 15, 2020, 11:58pm

It sounds like you are confusing terms? There are three parameters: rating (r), rating deviation (RD) and rating volatility (σ).

The first two parameters, r and RD are converted in step 2 of the algorithm to mu and phi:

µ = (r − 1500)/173.7178
φ = RD/173.7178

The value of σ, the volatility, does not change.

meili_yinhua · July 16, 2020, 12:03am

volatility does change, in fact it’s the update that takes the most calculations to change

the reason for the ELO conversion is because, as I described before, volatility^2 is the variance of “true ratings” between periods. I’ve described exactly what this means elsewhere, but essentially if your volatility is 0.06, or ~10 ELO/sqrt(period), your RD will after one period (without playing), go up to 10, then progressively make its way to 20 after four periods (following a square root curve), 30 after 9, etc.

The key difference between glicko-1 and glicko-2 is that it stops assuming this volatility is the same for all players (note the constant “c” in glicko-1), and progressively updates it

DVbS78rkR7NVe · July 16, 2020, 12:06am

Probably related to this. We should have smaller deviations as I understand it.

KillerDucky · July 16, 2020, 12:07am

I was just quoting the glicko2 Step2 part because I saw you converting volatility into Elo scale. I thought maybe you confused the volatility and rating deviation terms. I guess you didn’t, and I haven’t done the math to see what you mean. Can you post your calculations? Honestly by the time it gets that far I will probably not do the necessary work so I probably just bow out here.

meili_yinhua · July 16, 2020, 12:26am

well, my calculations are quite long, but I do have a spreadsheet here: https://docs.google.com/spreadsheets/d/1HcxHp4LiI0wJ65KUZCikDj-oecUWun_7n_6pvc2xvUE/edit?usp=sharing

redone using the data given by S_Alexander here: https://online-go.com/termination-api/player/165439/glicko2-history

You’ll notice the prior Rating and RD are from the game before, but I use the volatility from the most recent line. My reasoning for this is that volatility only enters the rating and RD calculations after it has been updated

Using my understanding of the “sliding window” ratings periods containing the last 15 games, I did the update between those two, expecting the output rating and RD to be 1936.11 and 64.72 respectively, but the output I get is 1918.78 and 53.73 respectively, meaning that either something is off with my calculations (and I have checked the paper rather thoroughly to make sure my result makes sense), or a bug in the actual ratings calculations.

Funnily enough the result gets closer to the expected values if I change the phi-star to phi-prime calculation from 1/sqrt(1/phi-star^2 + 1/v) to 1/sqrt(1/phi-star+1/v), outputting 1933.50 and 70.48

I can get it to output an RD of 64.72 by adjusting volatility alone, but that results in an output rating of 1927.99, and only accomplishable with a volatility of 0.3547 (if converted to ELO scale is ~61.6 ELO/sqrt(ratings_periods)), which doesn’t bear close relation with the volatility (other than (volatility*10)^2)

So I doubt it’s just volatility doing weird things (especially since that other calculation seems to have a similar effect)

EDIT: yes, I did check that the calculations fit the test case in the paper using this other sheet: https://docs.google.com/spreadsheets/d/1WHO82lK0l3q-Ek9MNRRwOnn7WL0kVU0Mscfw4ZnMoSY/edit?usp=sharing

flovo · July 16, 2020, 3:30am

I don’t think anoek corrected the deviation calculation in the live code.

You can check the OGS glicko implementation if you want.

github.com

online-go/goratings/blob/92aea105154bf50c9ebc3ee5431b73297e7f94e8/goratings/math/glicko2.py#L159


      
              fB = fC
          
              safety -= 1
          
          new_volatility = exp(A / 2)
          
          # step 6
          phi_star = sqrt(player.phi ** 2 + new_volatility ** 2)
          
          # step 7
          phi_prime = 1 / sqrt(1 / phi_star ** 2 + 1 / v)
          mu_prime = player.mu + (phi_prime ** 2) * delta_sum
          
          # step 8
          ret = Glicko2Entry(
              rating=min(MAX_RATING, max(MIN_RATING, GLICKO2_SCALE * mu_prime + 1500)),
              deviation=min(MAX_RD, max(MIN_RD, GLICKO2_SCALE * phi_prime)),
              volatility=min(0.15, max(0.01, new_volatility)),
          )
          return ret

meili_yinhua · July 16, 2020, 3:50am

I looked through, not good enough at python to do a line-by-line with the test case, but everything seems to be right in the link you sent,

so it could be a live code thing, or I need a better explanation of what exactly is meant by the “sliding window”, because as far as I understand it, it takes your rating, deviation, and volatility from the last period, then updates based on the most recent 15 games, so long as they were all within the last 90 days, despite the fact that 14 of them were calculated in the previous ratings period.

The data in the sheet is according to that understanding of how it works right now

anoek · July 16, 2020, 12:34pm

Yeah I purposefully did not update the live code since the sliding window thing was tuned to have the bug in it, I think if I fix it it’ll actually make things worse for the time being.

The bug was in step 6 FYI:

phi_star = sqrt(player.phi ** 2 + new_volatility ** 2)

at some point I must have hit ‘x’ (delete character) by accident and not caught it, because it was:

phi_star = sqrt(player.phi ** 2 + new_volatility * 2)

Fail on me for not having unit tests testing the glicko2 expected output (I did at one point verify everything was correct). The new goratings repo has unit tests for it.

Anyways, your reckoning that something funky was up with the volatility is fairly accurate, but it’s not that our volatility is crazy off, just that we had a bug in how it was used.

But all of that is somewhat moot, as soon as we get https://github.com/online-go/goratings stable we’ll be doing another update and wipe the effects of that bug out.

Samraku · July 16, 2020, 3:17pm

I see we have someone who knows what a good text editor looks like.

flovo · July 16, 2020, 4:04pm

There are other text editors?

Samraku · July 16, 2020, 5:04pm

Yes, vi.

Eugene · July 17, 2020, 4:57am

But with ‘x’ as delete, it was vi they are talking about?

BHydden · July 17, 2020, 5:01am

Samraku is specifically distinguishing vi from vim. Because, there is only one reasonable excuse to not be using vim

trohde · January 26, 2021, 11:26pm

@anoek’s Jan 26 Announcement:

<closing this thread>