I can believe that some of the bots are so weird that playing with them isn’t really like playing Go. For example, you could make a weak bot starting with KataGo and adding occasional random moves. But a game against that bot that would be more like playing a strange go variant, and I can believe that its rank would depend mostly on how much the current crop of players is into go variants, more than on anyone’s actual skill at Go. (Note: this is measurable in principle, by comparing multiple bots over time.)
To take it to an extreme, I wouldn’t expect to get a great ranking system by anchoring to our players’ performance playing chess against a chess bot.
So I agree it could easily be problematic.
Among our bots I think I’ve only played seriously against GnuGo - just a couple of times - and I don’t find it that weird. I played a game against it just now and it felt like a somewhat distracted 10k who sometimes manages to pay attention and show great reading skill. If I encountered it in a normal pairing I probably wouldn’t suspect that anything was unusual about the game.
I’m sure I’d notice more weird behavior if I played it more. Not sure how much that matters.