Rank question & board size

Samraku · March 6, 2021, 2:24am

Could you clarify what you mean? I suspect we agree, but I’m having trouble accepting that your terminology of 9x9 having more upsets is accurate.

I would see it that the depth of skill differentiation (what is the difference between the highest and lowest elo (or equivalent: glicko-2 in this case) for players of the game) in 9x9 is decidedly less than for 19x19. But players separated by the same number of elo (or equivalent) points in 9x9 and 19x19 are still going to have the same W-L ratio in both modes…

So how are upsets harder if they’re the same difficultly by definition?

dragon-devourer · March 6, 2021, 10:13am

9x9 ~ 50 moves per game Vs 19x19 ~ 250 moves per game so smaller boards are faster to play. So having the same weighting means rating will adjust faster if you play more on smaller boards. Weighting board sizes will even it up.

Similarly, blitz ~ 15 mins Vs live ~ 1.5 hours so again, rating will adjust faster for blitz.

Plus, the main point about time settings: (I hypothesize) that excessively fast / slow time settings results in an inaccurate measure of true strength. For example (and would be interesting to look at this data for other players too), my overall rank is 8k, my “live” 19x19 rank (mostly based on borderline blitz time settings actually) is 9.3k and my correspondence 19x19 rank is 4.5k. A possible reason is that I only play a move in a correspondence game when I feel in the mood - if it seems difficult and I’m tired I will skip it till later. But in a live game I have to keep going even if I’m losing focus.

Incidentally, it seems all of my blitz games (1m+5x10s) have been rated as live games. Is this a known bug?

Kosh · March 6, 2021, 10:14am

Yes.

shinuito · March 6, 2021, 10:44am

If I just play two live games a day, or generally a fixed number of games a day they’ll adjust at the same speed. It really depends on peoples habits.

If I play a new game as soon as I finish a previous one and play as many games as quickly as I can, then, maybe the ratings adjust “faster”.

I’m very similar.

Is it not useful to just have the separate ratings in the table as indicators of different board/time settings strengths and/or frequency of play. Do they need to weigh in differently to the overall rank?

I can understand that some people don’t want to mess with their overall rank if it’s from 19x19 games by playing 9x9 or 13x13 etc, or if they think their rank will go down if they play time settings other than their usual. I’m still not sure I understand why one should weight them less, and in particular pick an arbitrary weighting. Why not make 9x9 0.25 and 13x13 0.5 because there’s about half the number of points on 9x9 as to 13x13 and half of 13x13 on 19x19?

github.com/online-go/online-go.com

Automatch blitz with Fischer time counted as a live game

opened 08:57AM - 01 Feb 21 UTC

closed 12:08AM - 07 Jan 24 UTC

TeXitoi

stale

**Describe the bug** I've selected Fischer time in the preferences of the bli…tz game. I've done such a game, and it appears as a live game in the history and rank calculation. See for example https://online-go.com/game/30655665 **To Reproduce** Steps to reproduce the behavior: 1. Go to 'play' 2. Click on 'settings' of 'Quick match finder' 3. Blitz Time control "prefer Fischer" 4. close preferences 5. blitz automatch 6. The time limit will be "Fischer: Clock starts with 30 seconds and increments by 10 seconds per move up to a maximum of 1 minute." 7. Play the game 8. Look at the game in your game history, it is "live" and not "blitz". **Expected behavior** When playing a blitz game with automatch, the game is counted as blitz.

teapoweredrobot · March 6, 2021, 1:18pm

But isn’t the question whether or not the adjustment results in lower accuracy? Surely it doesn’t matter if the adjustment is fast or slow as long as it is correct overall.

Even if smaller boards/faster/slower games have more noisy data then simply slowing the adjustment in those cases is not going to improve accuracy. Or would it? I’m not a statistician/mathemagician nor even a very good Go player to really be able to know.

benjito · March 6, 2021, 1:27pm

You’re totally right, by definition, people separated by the same elo (or glicko ) will have the same win rate whether on 9x9 or 19x19.

However, that makes the assumptions that the two systems (9x9 and 19x19) are completely separate. The truth is that the two systems are not separate, and skill is not separate either. We use skill on 19x19 to predict skill on 9x9 and vice versa.

However, if I play a person 4 stones better than me, I am positing that I will generally be able to beat them more often in an even game on the 9x9 than I can on the 19x19. I could be wrong, but that’s my theory.

EDIT: and yes I suspect we do agree. Since you say the breadth of the elo rankings is greater on the 19x19, it seems necessary to apply different weights in order to generate an overall rank. Of course if everyone plays even games or a balanced number of handicap games, the results are more or less the same. But it does seem possible to skew up by only playing stronger opponents on the 9x9

shinuito · March 6, 2021, 1:36pm

I think I smell another 9x9 tourney brewing

benjito · March 6, 2021, 1:39pm

Can one make a tourney with two board sizes? And specify that every game has to be a rank difference of 4 stones? That sounds like a blast haha

benjito · March 6, 2021, 1:48pm

Of course noisy data is fine. You will end up at the same result eventually (law of large numbers and whatnot).

The part where things start to get wonky is if you’re only playing people above or below your in ranking. Glicko says that you will win an even game against a person 200 pts lower than you with probability 95%. If you somehow add noise to the games, you will win less often (let’s say 80%). If that’s the case, and you only play these types of lopsided games, your rank appear lower than it really is. Same for if you only play stronger opponents, but in the other direction.

Disclaimer: I have a math background, but not statistics specifically, and I’m just discussing so I make no guarantees to the rigor of my claims