Not at all.
It would show that they did the responsible thing by performing reasonable tests to validate their modifications to the rating system.
Or that they didn’t.
But what’s clear is that the community doesn’t really understand they aren’t using off the shelf Glicko, and thus don’t actually have an underpinning to the rating system here except that testing.
If that’s the only thing validating their choices, they’re absolutely obligated to share it with the community.
“Just trust us, we’re experts!” isn’t very convincing, especially because many members of the community feel there are issues. (Hint: Maybe some of the people who are complaining are experts on the stability of these kinds of systems and think the math looks funky.)
It tells me that the devs implemented a custom solution, but that doesn’t mean I should just trust them that it works, particularly in the face of empirical evidence that it doesn’t. It most assuredly does not require that people understand the mathematics in order for them to type in code, which is all that seems to have happened here.
In fact, that they keep saying they implemented Glicko when they didn’t implement Glicko smells of dishonesty.
That they’re unwilling to say how they tested their modifications is a great reason to doubt their expertise on the topic – they don’t want to say because they know it’s a serious breach of the theory they slapped together half-assedly, without any solid proof that it’s stable.
That’s irresponsible leadership. Period.
There’s absolutely no way to do this, since they wrote a special-snowflake system based on an actual one, then refuse to actually discuss how it’s implemented or tested.
You’re being dishonest – I did exactly that in describing the problem originally, then people went “lulz, we use Glicko” which is untrue, the site does not use Glicko, and the admins are simply silent on the underpinnings of their actual methods.
How, exactly, do you expect me to refute their magic “we tested it!” when the admins won’t discuss how they tested their bespoke not-Glicko rating system?
I’ve already dug into it substantially: enough to realize that the modifications they made and the continuous ratings probably aren’t stable, and can cause the sorts of dynamic problems people have been calling out around 13-kyu/1500. But there really isn’t a way forward without seeing their work on if it actually is stable, or not. (And the degree to which they analyzed that before rolling out a new ratings system.)