2020 Rating and rank tweaks and analysis

anoek · June 24, 2020, 6:19pm

tl;dr - Ratings and ranks have been tweaked a little, there should be a bit less volatility in ranks now. Ratings and ranks have been adjusted retroactively.

Main changes:

We were using fixed windows to update ranks, considering up to 15 games at a time to update your rank. When we got to the 16th game, we’d “commit” your rank and start a new block to update your ranks. We now use a sliding window, so we look at your most recent 15 games (up to 90 days), and use that to update your rank, this helps reduce the rating deviation notably and helps smooth things out.
With the 15 game block system, we were updating the ratings of your opponents in your block, so if you played a new 13k and you lost, but later it turned out they were actually a 5d player, that loss wouldn’t count for very much. The next game you played, your new rating would take that into account. This had some good aspects, but also made it possible for your rating to go down, even though you won a game (for instance if you lost against a 13k, but later they were updated to be a 25k). We no longer do this, ~~so you should no longer ever go down in rating when you win a game~~ (see note* below) (but you will also no longer have an adjusted game if you lost to a 5d who just started a new 13k account).
Because of how we implemented things, we were not able to annul games that were not in your current block of games, and we weren’t able to un-annul games. We can now do both, annul any game, and un-annul it.

Before:

After:

* Ranks can still go down with a win, this is due to the sliding window aspect. We compute your new glicko rating by looking at what your rating was 15 games ago and accounting the results of the game since then. If you had a higher starting point at your 16th game, you’ll be starting at a lower starting point for the new rating update, and so it’s still possible for you to go down in rating after a win.

Analysis of our rating system

In August of 2017 OGS switched over to a Glicko 2 rating system and we began using a non linear rating to rank function log(rating / 850) * 31.25. An analysis of how this system is functioning has been on my to-do list for awhile, and since I broke a good number of games a couple of months back and had to do a bit of work within the rating code to repair the damage, I figured I should finally take the time to check in on things and see if there were any other adjustments we should be doing along with the repair.

The three main questions I explored were:

Is our rank conversion producing good results, and is there a better rating to rank function?
Does it make sense to continue counting 9x9, 13x13, and 19x19 games toward the same overall rank like we do?
Does it make sense to use ranks below 25k?

To do this we looked at all of our games and picked the games where:

Both players ranks were reasonably established (deviation < 120)
The game ended in either resignation or went to the score phase (no disconnects or timeouts)
The game was flagged as being ranked
White’s rank was appropriate given the handicap for the game.

There were 9.08M games that met all of these requirements. As a side note and interesting statistic, it takes an average of 6.43 games for a player to get out of provisional status, and 12.02 games to achieve a pretty stable rank (as defined by having an RD < 120). Overall I consider that very good and am quite pleased with how fast Glicko2 can properly establish the strength of a player.

1. Is our rank conversion producing good results, and is there a better rating to rank function?

First and foremost, how are we doing at matching players up and picking a good handicap for them? To evaluate this we looked at all of our games that met the above criteria and looked at the win rates for black. For comparison I’ve included similarly computed win rates from the European Go Database (EGD).

edg-ogs

Overall, not too bad! We expect to see a white bias in handicap games because white is always the stronger player, and black only gets one stone per full rank difference, so we’re not shooting for 50% here, but ideally we will see something fairly flat as we increase the rank difference and number of stones given, which we do for the most part. If there is a different trend, that indicates that the spacing between ranks isn’t quite ideal.

That being said I did take some time to explore some alternative mappings to try and flatten that line a bit more and get rid of the 9 handicap dip and in general smooth out per-band rank handicap bands (which are a little noisier than the smooth line above), however while I came up with a few other comparable functions, there were none that I found that were notably better than what we already have. So overall I’m pretty satisfied with our rating system and rank mapping, and I think we can pretty much consider this to be the rating and ranking system we’ll be using for the foreseeable future until someone smarter comes along with a better one.

2. Does it make sense to continue counting 9x9, 13x13, and 19x19 games toward the same overall rank like we do?

Whether we combine ranks into an “overall” rank or use separate ranks has been a debated topic since we started this project. To answer this, I computed ratings and ranks of players considering only 9x9, 13x13, 19x19, or all games, and considered only games that met the criteria above (RD<120 etc). Then we looked at how each rating system was doing when they were predicting games, comparing the resulting win rates for black with the expected win rates for black. The closer those two numbers are, the better.

This is a set of tables for handicaps 0 through 3, each row shows a rank band from the rank listed until the next rank (so 30k-26k for the first row in each entry). There are two numbers separated by a colon (:), the first is the actual win rate (%) for black, and the second is the expected win rate (%) for black (that is to say, what the rating system thought the win rate should be). n is the number of games in that block.

9x9 ranks predicting 9x9 games

            Handicap 0               Handicap 1             Handicap 2             Handicap 3

    30k  47.8 : 49.6 [n=2579]    33.3 : 49.8 [n=39]     28.0 : 48.5 [n=25]      40.7 : 48.9 [n=27]  
    25k  48.4 : 49.7 [n=20436]   44.7 : 48.7 [n=4417]   54.7 : 49.6 [n=311]     65.4 : 50.1 [n=104] 
    20k  47.7 : 49.6 [n=60398]   45.0 : 48.9 [n=32404]  51.5 : 49.5 [n=555]     63.6 : 49.6 [n=176] 
    15k  47.4 : 49.4 [n=207911]  44.7 : 48.8 [n=70964]  54.0 : 49.3 [n=2123]    63.6 : 49.7 [n=558] 
    10k  48.9 : 49.4 [n=333649]  45.6 : 49.1 [n=44339]  55.5 : 49.6 [n=1802]    71.4 : 50.3 [n=245] 
    5k   49.3 : 50.2 [n=83326]   44.2 : 49.9 [n=2287]   61.4 : 50.0 [n=451]     75.0 : 51.4 [n=96]  
    1d   52.8 : 50.9 [n=4699]    58.3 : 50.1 [n=24]     100.0 : 52.2 [n=3]      --  
    6d   46.7 : 54.1 [n=15]      --                     --                      --  
    ALL 48.4 : 49.5 [n=713013]   45.0 : 48.9 [n=154474]  54.8 : 49.5 [n=5270]    65.8 : 50.0 [n=1206]    

13x13 ranks predicting 13x13 games
    30k 44.6 : 49.6 [n=121]     47.5 : 49.9 [n=219]     100.0 : 48.3 [n=1]      50.0 : 47.1 [n=2]   
    25k 48.0 : 49.5 [n=2773]    45.5 : 49.7 [n=1756]    52.9 : 49.5 [n=104]     32.6 : 48.9 [n=43]  
    20k 44.0 : 49.3 [n=14530]   45.4 : 49.8 [n=11125]   44.4 : 49.4 [n=941]     42.7 : 49.4 [n=293] 
    15k 45.8 : 49.3 [n=60647]   46.5 : 50.0 [n=16939]   49.7 : 48.8 [n=2864]    39.7 : 49.1 [n=648] 
    10k 47.3 : 49.4 [n=85200]   47.8 : 50.3 [n=12933]   59.7 : 49.4 [n=2635]    48.0 : 49.9 [n=306] 
    5k  49.3 : 50.3 [n=17200]   45.4 : 50.0 [n=866]     58.3 : 49.8 [n=357]     57.6 : 49.9 [n=33]  
    1d  46.2 : 51.0 [n=249]     --                      --                      --  
    6d  66.7 : 60.7 [n=3]       --                      --                      --  
    ALL 46.7 : 49.4 [n=180723]  46.5 : 50.0 [n=43838]   53.3 : 49.2 [n=6902]    42.5 : 49.3 [n=1325]    


19x19 ranks predicting 19x19 games
    30k 48.4 : 49.6 [n=1504]    42.9 : 49.1 [n=212]     39.2 : 49.4 [n=227]     34.3 : 49.7 [n=67]  
    25k 45.9 : 49.5 [n=18333]   40.9 : 49.6 [n=4517]    40.0 : 49.7 [n=4083]    41.2 : 49.8 [n=857] 
    20k 47.3 : 49.5 [n=98224]   42.4 : 49.2 [n=20335]   41.3 : 49.2 [n=14927]   42.2 : 49.3 [n=4369]    
    15k 49.0 : 49.4 [n=235420]  43.5 : 48.7 [n=40980]   43.3 : 48.6 [n=26238]   45.5 : 48.9 [n=10026]   
    10k 49.6 : 49.3 [n=344796]  45.0 : 48.6 [n=45871]   46.3 : 48.5 [n=23862]   47.3 : 48.9 [n=9851]    
    5k  49.7 : 49.5 [n=245545]  44.8 : 48.7 [n=17596]   44.7 : 48.4 [n=8277]    41.8 : 48.4 [n=3881]    
    1d  49.2 : 49.8 [n=33502]   42.6 : 49.5 [n=2086]    47.5 : 50.0 [n=1100]    47.5 : 50.0 [n=589] 
    6d  48.9 : 50.6 [n=1585]    54.1 : 49.4 [n=181]     70.0 : 50.3 [n=50]      76.9 : 53.3 [n=26]  
    ALL 49.2 : 49.4 [n=978909]  43.9 : 48.8 [n=131778]  43.9 : 48.8 [n=78764]   45.0 : 48.9 [n=29666]   



Combined rating system


Predicting 9
    30k 49.2 : 49.8 [n=17573]   47.8 : 48.6 [n=2060]    49.2 : 49.9 [n=240]     68.0 : 49.9 [n=125] 
    25k 48.6 : 49.7 [n=43982]   45.5 : 48.7 [n=22370]   56.2 : 49.8 [n=299]     63.0 : 50.5 [n=100] 
    20k 48.1 : 49.5 [n=117350]  45.1 : 48.4 [n=56551]   53.7 : 49.3 [n=779]     70.5 : 49.6 [n=325] 
    15k 49.0 : 49.5 [n=262529]  45.4 : 48.1 [n=53642]   57.7 : 49.5 [n=1413]    72.6 : 49.8 [n=332] 
    10k 50.6 : 49.7 [n=206142]  47.9 : 48.6 [n=14727]   57.1 : 49.7 [n=1120]    71.7 : 50.8 [n=237] 
    5k  49.9 : 50.2 [n=49161]   43.4 : 49.0 [n=648]     60.5 : 50.6 [n=124]     75.0 : 49.2 [n=12]  
    1d  54.9 : 50.6 [n=4802]    52.1 : 47.7 [n=73]      100.0 : 51.5 [n=9]      87.5 : 53.4 [n=8]   
    6d  59.6 : 51.8 [n=171]     --                      --                      --  
    ALL 49.4 : 49.6 [n=701710]  45.6 : 48.4 [n=150071]  56.3 : 49.6 [n=3984]    70.6 : 50.1 [n=1139]    

Predicting 13
    30k 49.3 : 49.6 [n=2094]    49.5 : 49.8 [n=2393]    65.6 : 49.0 [n=32]      48.6 : 49.9 [n=37]  
    25k 47.5 : 49.5 [n=8051]    46.9 : 49.5 [n=8310]    41.6 : 48.8 [n=308]     33.3 : 49.1 [n=33]  
    20k 46.5 : 49.2 [n=24573]   46.3 : 49.6 [n=20826]   49.8 : 48.6 [n=1222]    46.2 : 49.6 [n=279] 
    15k 47.6 : 49.2 [n=78278]   47.3 : 49.5 [n=24001]   50.5 : 48.2 [n=2349]    41.1 : 49.5 [n=445] 
    10k 49.0 : 49.4 [n=86805]   47.9 : 50.0 [n=13165]   57.8 : 48.3 [n=1500]    48.4 : 50.0 [n=182] 
    5k  50.5 : 50.2 [n=19668]   45.6 : 49.9 [n=1041]    56.2 : 48.9 [n=169]     46.7 : 47.2 [n=15]  
    1d  54.5 : 50.7 [n=985]     60.7 : 51.2 [n=28]      100.0 : 45.8 [n=4]      --  
    6d  40.5 : 51.8 [n=42]      --                      --                      --  
    ALL 48.3 : 49.4 [n=220496]  47.1 : 49.6 [n=69764]   52.1 : 48.4 [n=5584]    44.0 : 49.6 [n=991] 

Predicting 19
    30k 50.5 : 49.6 [n=2299]    44.3 : 48.9 [n=221]     36.9 : 48.5 [n=222]     31.7 : 49.5 [n=63]  
    25k 48.0 : 49.5 [n=10385]   42.3 : 48.9 [n=3479]    40.6 : 48.9 [n=3215]    38.3 : 49.2 [n=664] 
    20k 47.7 : 49.4 [n=61028]   41.1 : 48.7 [n=18301]   39.3 : 48.9 [n=15631]   39.9 : 49.0 [n=3557]    
    15k 48.7 : 49.3 [n=224916]  42.1 : 48.4 [n=48650]   41.1 : 48.6 [n=34843]   41.8 : 48.9 [n=12427]   
    10k 49.2 : 49.3 [n=410840]  43.1 : 48.3 [n=68206]   43.2 : 48.5 [n=40606]   44.0 : 48.9 [n=15422]   
    5k  49.6 : 49.4 [n=381050]  44.3 : 48.4 [n=32244]   44.5 : 48.3 [n=15511]   43.1 : 48.4 [n=6854]    
    1d  49.4 : 49.9 [n=60938]   41.3 : 49.0 [n=3580]    46.5 : 49.2 [n=2010]    44.3 : 49.0 [n=1061]    
    6d  49.0 : 50.6 [n=2519]    53.2 : 50.0 [n=220]     66.1 : 50.4 [n=59]      84.8 : 52.3 [n=33]  
    ALL 49.2 : 49.4 [n=1153975] 42.8 : 48.4 [n=174901]  42.2 : 48.6 [n=112097]  42.7 : 48.8 [n=40081]   



Combined prediction of all games
    30k 49.3 : 49.8 [n=21966]   48.5 : 49.2 [n=4674]    44.7 : 49.2 [n=494]     54.7 : 49.8 [n=225] 
    25k 48.3 : 49.6 [n=62418]   45.5 : 48.9 [n=34159]   41.9 : 48.9 [n=3822]    41.2 : 49.4 [n=797] 
    20k 47.8 : 49.4 [n=202951]  44.6 : 48.7 [n=95678]   40.7 : 48.9 [n=17632]   42.7 : 49.1 [n=4161]    
    15k 48.7 : 49.4 [n=565723]  44.5 : 48.5 [n=126293]  42.2 : 48.6 [n=38605]   42.6 : 48.9 [n=13204]   
    10k 49.6 : 49.4 [n=703787]  44.5 : 48.6 [n=96098]   44.1 : 48.5 [n=43226]   44.5 : 48.9 [n=15841]   
    5k  49.7 : 49.5 [n=449879]  44.3 : 48.5 [n=33933]   44.7 : 48.3 [n=15804]   43.1 : 48.4 [n=6881]    
    1d  49.8 : 49.9 [n=66725]   41.6 : 49.0 [n=3681]    46.8 : 49.2 [n=2023]    44.6 : 49.1 [n=1069]    
    6d  49.6 : 50.7 [n=2732]    53.2 : 50.0 [n=220]     66.1 : 50.4 [n=59]      84.8 : 52.3 [n=33]  
    ALL 49.2 : 49.5 [n=2076181] 44.6 : 48.6 [n=394736]  43.1 : 48.6 [n=121665]  43.5 : 48.9 [n=42211]

From this data there’s two things to note:

Using a combined rating works quite well, certainly comparable or better than looking at per-size strengths by themselves. It seems to me like it makes sense to keep using it.
Using overall ratings to predict 9x9 games works pretty good at HC 0 and at HC 1, indicating to me that the strength bands are pretty compatible with 19x19 or just “go ranks” in general. However, going beyond HC 1, predictions start to get bad pretty quick. I believe this is an indication that the “Old Japanese Recommendation” is not so great for us, and that we should strongly consider figuring out what the best 9x9 (and probably 13x13) handicap setup should be.

EDIT: The question arose about considering blitz vs live vs correspondence ranks, here’s the data from that, which I believe is still very supportive of using an overall rank for picking handicap.

Considering only 19x19 games:

19x19 blitz ranks predicting blitz

            Handicap 0               Handicap 1             Handicap 2             Handicap 3

    30k --                      --                      --                     --  
    25k 60.0 : 49.1 [n=30]      --                      --                     --  
    20k 40.3 : 49.1 [n=5422]    43.4 : 49.2 [n=387]     42.8 : 49.5 [n=458]    41.5 : 49.7 [n=398]
    15k 44.9 : 49.5 [n=25773]   42.8 : 49.7 [n=3537]    44.1 : 49.7 [n=2679]   44.2 : 49.8 [n=2049]   
    10k 47.6 : 49.5 [n=35936]   44.5 : 49.8 [n=4946]    44.6 : 49.6 [n=2652]   44.1 : 49.7 [n=1563]   
    5k  48.2 : 49.9 [n=10003]   42.6 : 50.2 [n=1029]    45.0 : 49.7 [n=460]    38.2 : 49.2 [n=293]
    1d  45.9 : 49.3 [n=379]     57.6 : 48.9 [n=99]      61.4 : 50.5 [n=44]     81.0 : 51.3 [n=21]  
    6d  33.3 : 41.2 [n=6]       --                      --                     --  
    ALL 46.2 : 49.5 [n=77549]   43.8 : 49.7 [n=9998]    44.4 : 49.7 [n=6296]   43.7 : 49.7 [n=4326]    


19x19 live ranks predicting live
    30k 47.3 : 49.6 [n=886]     38.4 : 49.3 [n=146]     42.3 : 49.5 [n=130]    42.6 : 49.9 [n=47] 
    25k 45.7 : 49.4 [n=11854]   41.6 : 49.5 [n=3299]    40.0 : 49.7 [n=3125]   40.1 : 49.6 [n=604]    
    20k 46.9 : 49.4 [n=71949]   42.0 : 49.3 [n=17356]   40.5 : 49.4 [n=12812]  41.9 : 49.3 [n=3404]   
    15k 48.5 : 49.4 [n=184092]  43.7 : 49.0 [n=34942]   43.3 : 48.8 [n=21729]  46.2 : 49.0 [n=7439]   
    10k 49.3 : 49.2 [n=275964]  45.1 : 48.8 [n=37168]   46.2 : 48.8 [n=18604]  47.0 : 49.2 [n=6431]   
    5k  49.4 : 49.3 [n=187928]  45.4 : 49.2 [n=11581]   44.6 : 48.9 [n=4973]   41.5 : 48.5 [n=2124]  
    1d  48.4 : 49.6 [n=23931]   42.6 : 49.5 [n=1496]    47.1 : 50.0 [n=751]    46.2 : 50.2 [n=463] 
    6d  47.9 : 50.6 [n=823]     36.0 : 49.8 [n=50]      57.1 : 52.3 [n=7]      50.0 : 58.6 [n=2]  
    ALL 48.8 : 49.3 [n=757427]  44.0 : 49.0 [n=106038]  43.6 : 49.0 [n=62131]  45.1 : 49.1 [n=20514]   

19x19 corr ranks predicting corr
    30k 62.2 : 48.9 [n=37]      0.0 : 47.9 [n=4]        0.0 : 47.5 [n=2]       100.0 : 53.4 [n=1] 
    25k 47.2 : 49.4 [n=619]     48.8 : 49.0 [n=41]      51.0 : 49.7 [n=51]     61.9 : 49.3 [n=21] 
    20k 48.5 : 49.7 [n=5363]    45.5 : 49.8 [n=277]     44.0 : 49.5 [n=282]    45.1 : 49.9 [n=182]    
    15k 48.9 : 49.7 [n=18366]   44.6 : 49.8 [n=1029]    42.7 : 49.5 [n=975]    47.1 : 49.7 [n=690]    
    10k 50.1 : 49.8 [n=28380]   44.7 : 50.0 [n=1346]    49.4 : 50.0 [n=1147]   52.3 : 50.1 [n=710] 
    5k  52.1 : 50.3 [n=15582]   39.5 : 50.2 [n=511]     48.7 : 50.3 [n=417]    42.6 : 49.6 [n=263]    
    1d  51.7 : 50.6 [n=2695]    42.1 : 50.4 [n=57]      47.8 : 50.7 [n=23]     41.2 : 51.4 [n=17] 
    6d  55.2 : 51.1 [n=134]     --                      --                     --  
    ALL 50.1 : 49.9 [n=71176]   43.9 : 49.9 [n=3265]    46.5 : 49.8 [n=2897]   48.4 : 49.9 [n=1884]    



19x19 overall ranks predicting blitz

            Handicap 0               Handicap 1             Handicap 2             Handicap 3

    30k 50.4 : 49.6 [n=125]     23.1 : 48.7 [n=13]      44.4 : 50.4 [n=9]      83.3 : 49.7 [n=6]  
    25k 43.3 : 49.4 [n=1189]    38.9 : 50.0 [n=90]      35.3 : 50.4 [n=85]     35.5 : 49.7 [n=93] 
    20k 45.8 : 49.3 [n=8790]    42.0 : 49.1 [n=659]     45.0 : 49.2 [n=644]    42.0 : 49.7 [n=395]    
    15k 48.9 : 49.4 [n=21681]   42.1 : 48.6 [n=2786]    44.0 : 48.6 [n=2259]   47.5 : 48.7 [n=1688]   
    10k 49.5 : 49.4 [n=33029]   44.9 : 49.1 [n=5802]    46.9 : 48.8 [n=3376]   46.3 : 48.7 [n=2404]   
    5k  50.3 : 49.8 [n=35639]   44.6 : 48.4 [n=4863]    45.4 : 48.2 [n=2526]   43.6 : 48.3 [n=1225]  
    1d  49.1 : 50.0 [n=4395]    44.7 : 49.6 [n=385]     47.6 : 49.8 [n=212]    48.7 : 49.7 [n=113]    
    6d  48.5 : 49.4 [n=268]     64.8 : 49.7 [n=105]     75.6 : 50.4 [n=41]     88.9 : 53.1 [n=18] 
    ALL 49.2 : 49.6 [n=105116]  44.2 : 48.8 [n=14703]   45.7 : 48.7 [n=9152]   45.9 : 48.7 [n=5942]    

19x19 overall ranks predicting live
    30k 50.1 : 49.6 [n=997]     45.9 : 49.0 [n=170]     35.7 : 49.4 [n=185]    27.5 : 49.7 [n=51] 
    25k 45.7 : 49.5 [n=14269]   40.8 : 49.6 [n=4189]    40.0 : 49.7 [n=3781]   41.1 : 49.7 [n=683]    
    20k 47.1 : 49.4 [n=77569]   42.3 : 49.1 [n=18887]   40.9 : 49.2 [n=13695]  41.7 : 49.2 [n=3635]   
    15k 48.8 : 49.4 [n=191964]  43.4 : 48.7 [n=36924]   43.0 : 48.6 [n=22897]  45.0 : 48.8 [n=7702]   
    10k 49.6 : 49.3 [n=285823]  44.9 : 48.5 [n=38809]   46.1 : 48.4 [n=19351]  46.9 : 48.9 [n=6762]   
    5k  49.4 : 49.4 [n=194764]  44.9 : 48.8 [n=12065]   44.1 : 48.4 [n=5250]   41.0 : 48.3 [n=2348]  
    1d  48.8 : 49.7 [n=25554]   42.7 : 49.6 [n=1582]    47.3 : 50.0 [n=822]    48.0 : 49.9 [n=450]    
    6d  49.0 : 50.7 [n=966]     39.5 : 49.1 [n=76]      44.4 : 49.4 [n=9]      50.0 : 53.8 [n=8]  
    ALL 49.0 : 49.4 [n=791906]  43.8 : 48.8 [n=112702]  43.4 : 48.7 [n=65990]  44.5 : 48.9 [n=21639]   

19x19 overall ranks predicting corr
    30k 43.5 : 49.5 [n=382]     34.5 : 49.4 [n=29]      57.6 : 48.8 [n=33]     40.0 : 50.1 [n=10]  
    25k 48.0 : 49.7 [n=2875]    43.3 : 49.9 [n=238]     42.9 : 50.1 [n=217]    48.1 : 50.4 [n=81]  
    20k 49.5 : 49.7 [n=11865]   46.3 : 49.8 [n=789]     46.3 : 49.8 [n=588]    48.1 : 49.7 [n=339]    
    15k 50.9 : 49.8 [n=21775]   47.6 : 49.7 [n=1270]    48.1 : 49.5 [n=1082]   47.0 : 50.0 [n=636]    
    10k 50.4 : 49.8 [n=25944]   48.1 : 49.6 [n=1260]    48.6 : 49.7 [n=1135]   54.6 : 49.9 [n=685]    
    5k  52.3 : 50.2 [n=15142]   43.9 : 49.3 [n=668]     47.5 : 50.0 [n=501]    40.9 : 49.2 [n=308]    
    1d  52.5 : 50.4 [n=3553]    35.3 : 49.1 [n=119]     48.5 : 49.7 [n=66]     34.6 : 51.9 [n=26] 
    6d  49.0 : 51.4 [n=351]     --                      --                     --  
    ALL 50.7 : 49.9 [n=81887]   46.3 : 49.6 [n=4373]    47.7 : 49.7 [n=3622]   48.6 : 49.8 [n=2085]    


19x19 overall ranks predicting overall
    30k 48.4 : 49.6 [n=1504]    42.9 : 49.1 [n=212]     39.2 : 49.4 [n=227]    34.3 : 49.7 [n=67] 
    25k 45.9 : 49.5 [n=18333]   40.9 : 49.6 [n=4517]    40.0 : 49.7 [n=4083]   41.2 : 49.8 [n=857]    
    20k 47.3 : 49.5 [n=98224]   42.4 : 49.2 [n=20335]   41.3 : 49.2 [n=14927]  42.2 : 49.3 [n=4369]   
    15k 49.0 : 49.4 [n=235420]  43.5 : 48.7 [n=40980]   43.3 : 48.6 [n=26238]  45.5 : 48.9 [n=10026]  
    10k 49.6 : 49.3 [n=344796]  45.0 : 48.6 [n=45871]   46.3 : 48.5 [n=23862]  47.3 : 48.9 [n=9851]   
    5k  49.7 : 49.5 [n=245545]  44.8 : 48.7 [n=17596]   44.7 : 48.4 [n=8277]   41.8 : 48.4 [n=3881]  
    1d  49.2 : 49.8 [n=33502]   42.6 : 49.5 [n=2086]    47.5 : 50.0 [n=1100]   47.5 : 50.0 [n=589]    
    6d  48.9 : 50.6 [n=1585]    54.1 : 49.4 [n=181]     70.0 : 50.3 [n=50]     76.9 : 53.3 [n=26] 
    ALL 49.2 : 49.4 [n=978909]  43.9 : 48.8 [n=131778]  43.9 : 48.8 [n=78764]  45.0 : 48.9 [n=29666]

Considering all games

Blitz ranks predicting blitz

            Handicap 0               Handicap 1             Handicap 2             Handicap 3

    30k 38.5 : 49.1 [n=78]      33.3 : 48.2 [n=6]      0.0 : 46.3 [n=1]       --  
    25k 47.4 : 49.6 [n=1933]    43.9 : 49.3 [n=196]    49.4 : 49.8 [n=79]     62.1 : 50.6 [n=29]  
    20k 43.4 : 49.1 [n=21095]   43.2 : 49.5 [n=2188]   47.2 : 49.4 [n=788]    39.1 : 49.8 [n=322]
    15k 45.5 : 49.2 [n=131528]  43.6 : 49.6 [n=9565]   44.4 : 49.5 [n=3662]   42.1 : 49.7 [n=1957]   
    10k 47.8 : 49.5 [n=136510]  44.5 : 50.0 [n=10457]  45.7 : 49.9 [n=4458]   43.0 : 50.0 [n=2779]   
    5k  47.6 : 50.0 [n=36841]   44.4 : 50.1 [n=3434]   43.0 : 49.6 [n=1621]   42.1 : 50.2 [n=744]
    1d  50.3 : 50.4 [n=1271]    53.7 : 48.7 [n=123]    54.7 : 49.6 [n=64]     58.1 : 50.8 [n=43]  
    6d  52.0 : 48.7 [n=25]      0.0 : 51.9 [n=4]       100.0 : 59.8 [n=1]     --  
    ALL 46.5 : 49.4 [n=329281]  44.1 : 49.8 [n=25973]  45.0 : 49.7 [n=10674]  42.6 : 49.9 [n=5874]    


Live ranks predicting live
    30k 49.6 : 49.8 [n=20325]   46.8 : 49.3 [n=5587]   43.5 : 49.1 [n=437]    56.7 : 49.8 [n=171]    
    25k 48.3 : 49.6 [n=39715]   45.5 : 49.1 [n=28644]  42.5 : 49.1 [n=2850]   39.9 : 49.4 [n=479]    
    20k 47.6 : 49.5 [n=149652]  45.0 : 49.0 [n=81606]  40.7 : 49.1 [n=14237]  44.0 : 49.4 [n=2973]   
    15k 48.2 : 49.3 [n=398094]  44.7 : 48.9 [n=104484] 42.1 : 48.9 [n=30877]  42.8 : 49.2 [n=9499]   
    10k 49.1 : 49.3 [n=483024]  44.6 : 49.0 [n=73713]  44.3 : 48.9 [n=32129]  45.3 : 49.4 [n=10203]  
    5k  49.4 : 49.3 [n=318943]  45.0 : 49.1 [n=21888]  45.2 : 49.0 [n=9504]   43.2 : 48.8 [n=3456]   
    1d  48.9 : 49.7 [n=43212]   42.4 : 49.5 [n=2279]   46.7 : 49.7 [n=1225]   43.7 : 49.5 [n=762]    
    6d  48.0 : 50.7 [n=1479]    40.7 : 49.8 [n=86]     50.0 : 54.7 [n=12]     50.0 : 58.8 [n=6]  
    ALL 48.7 : 49.4 [n=1454444] 44.9 : 49.0 [n=318287] 43.0 : 49.0 [n=91271]  43.9 : 49.3 [n=27549]   

Corr ranks predicting corr
    30k 60.7 : 48.8 [n=28]      --                     --  --
    25k 51.5 : 49.4 [n=643]     35.7 : 49.3 [n=28]     48.5 : 49.2 [n=33]     41.4 : 49.3 [n=29] 
    20k 47.6 : 49.5 [n=10436]   44.3 : 49.5 [n=575]    47.1 : 49.7 [n=376]    43.6 : 49.7 [n=291]    
    15k 48.5 : 49.8 [n=34322]   42.6 : 49.7 [n=2050]   44.9 : 49.9 [n=1353]   44.2 : 49.7 [n=872]    
    10k 50.4 : 49.8 [n=48021]   45.6 : 49.9 [n=2192]   47.5 : 50.0 [n=1648]   48.9 : 50.2 [n=876] 
    5k  51.6 : 50.2 [n=29561]   42.6 : 50.1 [n=756]    47.8 : 50.3 [n=604]    46.0 : 49.9 [n=402]    
    1d  52.8 : 50.7 [n=5150]    47.4 : 50.4 [n=76]     59.6 : 52.2 [n=52]     54.5 : 51.6 [n=22] 
    6d  54.8 : 51.7 [n=283]     --                     --                     --  
    ALL 50.1 : 49.9 [n=128444]  43.9 : 49.8 [n=5677]   46.8 : 50.0 [n=4066]   46.1 : 49.9 [n=2493]    



Overall ranks predicting blitz

            Handicap 0               Handicap 1             Handicap 2             Handicap 3

    30k 49.3 : 49.8 [n=2830]    48.1 : 49.5 [n=183]    56.6 : 50.2 [n=53]     46.4 : 49.7 [n=28] 
    25k 47.5 : 49.5 [n=8120]    47.1 : 49.4 [n=900]    44.8 : 49.2 [n=210]    43.8 : 49.5 [n=112]    
    20k 46.3 : 49.2 [n=29016]   44.8 : 48.8 [n=3119]   49.3 : 48.9 [n=766]    47.6 : 49.0 [n=416]    
    15k 48.5 : 49.2 [n=112384]  44.5 : 48.7 [n=8508]   45.6 : 48.7 [n=3111]   41.8 : 48.5 [n=1850]  
    10k 50.1 : 49.5 [n=139556]  45.5 : 49.0 [n=11968]  45.8 : 48.7 [n=5262]   43.6 : 48.8 [n=3529]   
    5k  49.3 : 49.7 [n=73036]   44.4 : 48.3 [n=8027]   44.2 : 48.0 [n=3911]   43.0 : 48.1 [n=2182]  
    1d  50.7 : 50.2 [n=10274]   40.1 : 48.6 [n=893]    47.6 : 49.1 [n=525]    43.2 : 49.0 [n=257]    
    6d  48.3 : 49.6 [n=385]     63.5 : 49.6 [n=115]    75.0 : 49.7 [n=44]     91.7 : 51.4 [n=24] 
    ALL 49.1 : 49.5 [n=375601]  44.9 : 48.7 [n=33713]  45.7 : 48.5 [n=13882]  43.4 : 48.6 [n=8398]    

Overall ranks predicting live
    30k 49.4 : 49.8 [n=17236]   48.7 : 49.2 [n=4345]   41.9 : 49.1 [n=382]    56.4 : 49.8 [n=172]    
    25k 48.6 : 49.6 [n=48592]   45.5 : 48.9 [n=32577]  41.1 : 48.9 [n=3398]   39.7 : 49.4 [n=584]    
    20k 47.9 : 49.5 [n=158757]  44.6 : 48.7 [n=90988]  40.1 : 48.8 [n=16091]  42.0 : 49.0 [n=3351]   
    15k 48.6 : 49.4 [n=415690]  44.4 : 48.4 [n=115293] 41.6 : 48.5 [n=33868]  42.6 : 48.9 [n=10423]  
    10k 49.4 : 49.4 [n=514601]  44.2 : 48.5 [n=81733]  43.5 : 48.4 [n=36146]  44.2 : 48.9 [n=11341]  
    5k  49.6 : 49.4 [n=344875]  44.2 : 48.5 [n=24837]  44.7 : 48.3 [n=11061]  43.3 : 48.4 [n=4201]  
    1d  49.3 : 49.8 [n=49597]   42.2 : 49.1 [n=2564]   46.5 : 49.3 [n=1378]   44.7 : 49.0 [n=756]    
    6d  49.5 : 51.0 [n=1676]    41.7 : 50.3 [n=103]    40.0 : 52.6 [n=15]     66.7 : 54.8 [n=9]  
    ALL 49.0 : 49.4 [n=1551024] 44.5 : 48.6 [n=352440] 42.4 : 48.5 [n=102339] 43.3 : 48.9 [n=30837]   

Overall ranks predicting corr
    30k 48.6 : 49.6 [n=1900]    44.5 : 49.7 [n=146]    52.5 : 49.3 [n=59]     52.0 : 49.6 [n=25] 
    25k 47.4 : 49.7 [n=5706]    45.2 : 49.6 [n=682]    50.9 : 49.6 [n=214]    46.5 : 49.3 [n=101]    
    20k 49.0 : 49.6 [n=15178]   46.8 : 49.6 [n=1571]   45.7 : 49.6 [n=775]    43.7 : 49.9 [n=394]    
    15k 50.4 : 49.8 [n=37649]   47.2 : 49.8 [n=2492]   48.7 : 49.9 [n=1626]   43.7 : 49.7 [n=931]    
    10k 50.7 : 49.8 [n=49630]   48.1 : 49.7 [n=2397]   49.3 : 49.6 [n=1818]   50.9 : 49.9 [n=971]    
    5k  51.8 : 50.1 [n=31968]   45.7 : 49.5 [n=1069]   47.7 : 49.9 [n=832]    42.4 : 49.6 [n=498]    
    1d  52.6 : 50.6 [n=6854]    41.5 : 49.3 [n=224]    46.7 : 49.8 [n=120]    50.0 : 50.1 [n=56]  
    6d  50.4 : 50.7 [n=671]     50.0 : 55.2 [n=2]      --                     --  
    ALL 50.6 : 49.9 [n=149556]  46.8 : 49.7 [n=8583]   48.4 : 49.7 [n=5444]   46.1 : 49.8 [n=2976]    


Overall ranks predicting overall
    30k 49.3 : 49.8 [n=21966]   48.5 : 49.2 [n=4674]   44.7 : 49.2 [n=494]    54.7 : 49.8 [n=225]    
    25k 48.3 : 49.6 [n=62418]   45.5 : 48.9 [n=34159]  41.9 : 48.9 [n=3822]   41.2 : 49.4 [n=797]    
    20k 47.8 : 49.4 [n=202951]  44.6 : 48.7 [n=95678]  40.7 : 48.9 [n=17632]  42.7 : 49.1 [n=4161]   
    15k 48.7 : 49.4 [n=565723]  44.5 : 48.5 [n=126293] 42.2 : 48.6 [n=38605]  42.6 : 48.9 [n=13204]  
    10k 49.6 : 49.4 [n=703787]  44.5 : 48.6 [n=96098]  44.1 : 48.5 [n=43226]  44.5 : 48.9 [n=15841]  
    5k  49.7 : 49.5 [n=449879]  44.3 : 48.5 [n=33933]  44.7 : 48.3 [n=15804]  43.1 : 48.4 [n=6881]  
    1d  49.8 : 49.9 [n=66725]   41.6 : 49.0 [n=3681]   46.8 : 49.2 [n=2023]   44.6 : 49.1 [n=1069]   
    6d  49.6 : 50.7 [n=2732]    53.2 : 50.0 [n=220]    66.1 : 50.4 [n=59]     84.8 : 52.3 [n=33] 
    ALL 49.2 : 49.5 [n=2076181] 44.6 : 48.6 [n=394736] 43.1 : 48.6 [n=121665] 43.5 : 48.9 [n=42211]

3. Does it make sense to use ranks below 25k?

To answer this we looked at the win rates of handicap games between people that would be in the 30-25k range and compared them to other ranks:

This hump of the purple line, the 30-25k players, is black winning an unexpected amount when they are given handicap stones. One possibility for this that I explored a fair amount is that perhaps this is indicating that our rank could be improved down in this range to smooth that out to something more expected, however I was unable to find any fit that didn’t result in similar amounts of chaos. I think this is because there are other dominating factors beyond just the number of stones that are given, namely white doesn’t necessarily know how to approach handicap games yet, seeing that many stones probably psyches them out a bit, and blunders still matter more than a few extra stones most likely. So, basically I think the main purpose of the rank (being able to use it to calculate how many handicap stones you should give) begins to fall apart at this range.

That being said, it’s not all that bad, and there are some benefits to having those ranks beyond strictly correct functional computation of handicaps, namely so people can see their progress. I don’t know that we should go any lower than 30k, but we might very well bring back ranks down to 30k just so people don’t feel like they are perpetually stuck at 25k. We may still stick with just not giving handicap stones automatically to players in that range, or reduce how many are given, but all of that are things to determine later.

Ty-Phoon · June 24, 2020, 6:29pm

This is why I pay for this site.

mark5000 · June 24, 2020, 6:33pm

It almost goes without saying at this point, but thank you @anoek for maintaining this site, for interacting with the community, and for putting the community in a position where we can see the fruits of all your labors. We appreciate you! Good stuff here.

KillerDucky · June 24, 2020, 6:34pm

In my profile, I’m 2.2d by my name, but 1.4d in the graph. Is there something wrong?

kingkaio · June 24, 2020, 6:21pm

Which one should I listen to for my rating?
Screen Shot 2020-06-24 at 11.18.54 AM
This one says 17.9

While this one says 18.1

anoek · June 24, 2020, 6:26pm

The system is still catching up updating those, it should be all settled in a few more hours

DVbS78rkR7NVe · June 24, 2020, 6:41pm

Bummer, my rating went down a bit. Elusive dan got ever so slightly further from me.

I believe the graph rating is new rating and the one on profile card is old “cached” rating.

Gia · June 24, 2020, 6:44pm

Uh, I have a question.

Please consider bringing ranks until 30k.

I’m curious who’s “we”. I thought we only had @anoek to torture, didn’t know there’s a whole team of people that suffers because of us.

Thank you. For all the work, and the updates, and the company in solitude that is OGS.

Sallmard · June 24, 2020, 6:48pm

Brilliant to see such analysis and care going into the rating system

anoek · June 24, 2020, 6:53pm

The royal we

anoek · June 24, 2020, 6:54pm

@kingkaio and @KillerDucky - and everyone else - Yeah those rating values are still synchronizing, they should be all fixed up in the next few hours, or when you next complete a game, whichever comes first

meili_yinhua · June 24, 2020, 7:11pm

I’d argue that the main benefit for having those ranks are that matchmaking depends on them for restricting rank, and as such you should likely look into how even-game matchups work for that rank and how the matchmaking system should respond

Jade_9000 · June 24, 2020, 7:16pm

“Elusive Dan” should be a user name

Anthe · June 24, 2020, 7:38pm

It s hard to follow … before it was relatively clear to see the changes game by game; now the main rating doesnt change ( i assume for the mentioned amount of games)
the blitz ratings for example are feelable different to before (have they been recalculated fot eh last x games?)
the graph seems to be actual, but then it also doesnt move for some games (can it be that the last game isnt taken when the main rating is updated? )

first thoughts of a long time sw engineer and a short time go player

anoek · June 24, 2020, 7:46pm

The system is doing some catch-up work still @Anthe, it should be synchronized soon

Dorus · June 24, 2020, 8:08pm

Some ideas and alternate solutions, mostly based on my experience from running high level bots:

Like any ranking system, things get weird near the extremes, both high and low. Ogs is no different, but i’m highly impressed by how well it works. The biggest pain for my bots are cheaters that leach rank of the bot from low ranked accounts, thus messing up the entire rating system, but specifically my bot. Now the bot doesn’t have feelings, so it doesn’t care, but in turn the bot will then also harm users bold enough to play fair ranked games versus it.

The 15 game block system is one important way to mitigate both cheaters and quick risers, as the rank leach will spread out over multiple users. I still believe it would be good to recalculate games versus quick risers (or quick fallers), but the 2 solved issues (annul old games and not losing elo on wins) have some value too. Possibly best of both world can be found: Recalculate everything when an old game is annuled and keep a hidden rating for users where they slowly move towards, to avoid rank loss on wins. This hidden rank should only differ from real rank when a recalculation happened recently.

Now to the topic of sharing rank vs 9, 13 and 19 board size. This seems okay to me, strength greatly correlate between sizes. One problem however with high ranked bots again: Perfect komi for 9x9 is 7, and ogs uses 5.5. This gives a huge advantage to black, and even a medicore bot can win vs almost godlike bots like this. On cgos katago at high playouts is the strongest possible bot, standing high over anything else, yet my high playout katago bot on ogs manages to lose 9x9 games to 1kyu players that can only be cheating.

This is probably not something that can be solver trough rating system, but something to keep in mind for combing the rank. Unless komi gets fixed.

buzzsaw · June 24, 2020, 8:43pm

Perpetual Kyu should be a rank.

ivanloy · June 24, 2020, 8:45pm

This was nice, but unexpected, I suddenly got from 18-17kyu to 16kyu and my wins against stronger players went from 4 to 16 lol

Clossius1 · June 24, 2020, 8:56pm

I really like how well thought out your decisions are.

I have always had the idea that 9x9 and 13x13 are great for beginners but not for stronger players. I feel strongly about this but unlike you, I have no evidence.

I’m wondering if you would consider looking at starting beginners off on 9x9 for 30k-25k and 13x13 24-22k and 21k+ using 19x19 or some other standard. I picked 21k because teaching on stream has given me the feeling that 21k are strong enough to play 19x19 to some extent without falling apart really fast.

The reason for my opinion on this is at the higher ranks it is very chaotic on OGS. I can beat a 4D but lose to a 2k. I’ve had some 3k-1k that feel like that should be dans and others that feel like they should be that level. My feeling is that playing ranked small boards is effecting play at a higher level.

With that being said, as I mentioned above this is more of a feeling and I don’t have the evidence to support it. My hope is that you will look into it and make a post about the results.

TLDR: Would you consider looking into, and collecting data on, limiting ranked games on smaller board to the lower levels only so it doesn’t effect higher level ranks as much.