Testing the Volatility: Summary

[continued from points brought up in Edit: Turns out this topic is about amending the TOS or something, hop in šŸ¤·]

Maybe taking a step back will help. How would a true rank be defined? Iā€™m assuming it should be based solely on existing matches so we can try to make a probabilistic model and predict future results, but there are other methods (knowledge exams, etc).

Letā€™s say I have a set P of all OGS players, and a relation R over P containing tuples representing every game played, where (x,y) in R indicates player x beating player y. Iā€™ll abuse the notion of a relation to allow duplicate tuples to represent multiple games with the same result.

Given such a relation taken from OGS at some point in time, how would you define the ā€œtrueā€ rank of each player based on the existing game record relation R?

For example, you might have this data (you can pretend thereā€™s a lot more):

(GoLover, Player1)
(Player1, badukforever)
(Player1, badukforever)
(GoLover, atari_everything)
(atari_everything, noob_bot)
(Player1, noob_bot)
(badukforever, atari_everything)

Is it possible to stratify/rank these players in a consistent and useful manner? Maybe you could say each member of a given strata should win around 50% of games played within that same strata, and more often than that when playing lower ranks, but is it even possible to fit everyone into such a system consistently?

Now, according to the whatever definitions make up this imaginary true rank, what should be the expected outcome when Player1 challenges atari_everything? Can we assume any sort of transitivity? Further, what happens when this expectation is violated repeatedly in future games because there was no data about those two specific players challenging each other when the initial rank was constructed? Is the true rank wrong, or is volatility just a natural consequence?

Further, what happens when a new player joins? Do we make them play every other player until ranks stabilize? What about pools of people who mostly play each other that will inevitably form? What happens to the rankings when they play another player outside that pool?