Testing the Volatility: Summary

Definitely agreed – that’s part of why I like other metrics mentioned earlier, where models are instead judged by the “surprise” of encountering a certain game result given the model’s predicted win percentage rather than a binary correct/incorrect metric or similar. It might even be fine that many games have close to a 50% expected win rate. If a model thinks two players should have a 50/50 chance of winning against each other, that’s great as long as those players have very similar ranks. It should only be “punished” if it predicts a 50% win rate where there isn’t one, indicating some error in the ranks it has assigned.

It could involve maintaining a graph of recent interactions in the player base, and using a characteristic like graph distance between two players to weigh a parameter representing uncertainty about the difference between the baselines/average rankings of the respective “pools” those two players belong to. I fully agree that this would need to be examined with a lot more rigor, and might not produce any desirable results.

Yeah – this parameter seems to be a direct way of “tuning” the degree in which jumps in a player’s estimated rank over time are allowed, potentially controlling apparent volatility but also potentially reducing the predictive accuracy of the model if set too low. I do see that this “allowable rating difference” will also increase proportionally to time (as they mention and you mentioned earlier, “the variance increases linearly with time, so the confidence interval grows like the square root of time”), but in the short term using such a process as a prior in the MAP estimate should effectively limit the degree to which the estimated rating jumps when updated with games close in time. We are perhaps understanding the same thing, though I just wanted to relate it to this idea:

In this case, WHR seems to specifically control for this, but my overall goal was to point out in another way that models which try to globally optimize ranks to explain a given set of game results may do so at the expense of consistent individual rankings if not controlled.

2 Likes