I’ve been looking around for this more and have a little more of an idea of what is going on. I posted some details on the GitHub issue, but basically it seems like we’re supposed to be using the “Old Japanese Recommendation” for 9x9 but with only one choice of komi for each number of handicap stones.
In reality, the handicaps are being assigned as if the rank differences were significantly higher than they are.
Have 19x19 handicaps been working correctly since the update? (How are they calculated?)
I think there may be some inaccuracies in how the rating adjustments over the years have been propagated backwards through time.
Take a look at the rating graph on my profile page. The rating system seems to think that I peaked at 2 kyu in January 2015. However, back then, the system believed that I was around 10 kyu (looking at the ranks saved by the chat in some game records from back then).
There might be some merit to that idea, as that was pre-glicko in general, so we were still choosing our initial ratings, and then very often playing ppl with similar ratings. As such our ratings (depending on how often we played) when recalc’d might not have been pushed down enough by play to reach a similar place after recalculation from the common 1500 starting point.
But also this may just be rationalizing a phenomenon that shouldn’t happen as the populations of ppl at different (initial) ranks should mix somewhat continuously