I believe this just doesn’t make sense. If the whole population’s ratings drift together, then the automatic matchmaking will still match people of similar strength together. The auto handicap system is trickier of course, but it wouldn’t necessarily get worse either, especially if the drifting is somewhat homogeneous.
The main reason why drifting is undesirable, I believe, is because the same player can end up having different ranks in different populations, but that’s more a problem of the optics and cultural implications of having a specific rank, rather than the rating system by itself.
I agree that, to solve the equalization of ranks across populations, some kind of bot-based system is the most promising idea to investigate – but I also believe it’s actually surprisingly tricky.
For example, the reason this doesn’t work easily is that most bots which play at kyu level are down there because they have some specific “flaw” or “blind spot”.
A “4 kyu” bot is usually, perhaps always, not a good representation of what being a “4 kyu human” means – to be clear, this in practice means that players would be able to manipulate their rankings by just learning the specific “blind spots” of the bot and beating it consistently.
I can imagine some ways to try to circumvent this drawback, but I think it would be surprisingly tricky to have an actually solid system.
In a way, the issue really comes down to the fact that we don’t really know what it means for a Go-playing entity to “be 4 kyu”, or to use population-specific rating, say, 1600 Glicko on OGS.
What does that “mean” in terms of your play, really? I don’t think anybody in the world can answer that at the moment.
I think the most promising idea is actually to have a machine-trained bot that learns “what a game played by a (insert arbitrary rating number here) looks like”, for example, but until someone actually tries it, there’s no way of predicting if such a system could be reliable at all.