Testing the Volatility: Summary

I want to put forth a disclaimer: I’m aware that, even though I probably have more mathematical knowledge than a majority of human beings, I am essentially a layman in the context of this conversation, and trying to participate with my ideas kinda makes me a crackpot.

But I have a few things I’d like to hear your opinions on.



 Part 1: Analogy to the uncertainty principle for waves

TL;DR: in order to understand a wave-like function, looking at a point sample (or a narrow interval) it too little info. We need to look at the fluctuation itself to get that info.

This is inspired by this popular video. If I understood it and recall it correctly, one takeaway is that when you have a wave, say as a function of time, the less time you spend observing it, the less you can be sure of its properties (in that case wavelength).

 In our case, I believe that the “true rating” acts as a noisy wave fluctuating around a fairly stable curve over time, and for any attempt to “sample” it, there’s a lot of random noise that causes our measure to be inaccurate, which you could think of as another noisy wave being added to the “true rating”.

 What we want to do is not study the frequency spectrum of the wave, but just obtain enough experimental data to be able to know the shape of the wave as accurately as possible – but intuitively, the analogy rings true:

we need to look at more of the wave if we wish to understand it. Any single sample could be on the lower side of the wave, the upper side, or in the middle.

This brings me to two considerations:

  1. Even if what we wanted was just to “see through the noise” caused by experimental errors, intuitively we would need to try to guess where the center of the wave is, smoothing out the noise.

  2. Even if we could build a good picture of the wave, we would never know which of the fluctuations we observe are noise caused by the limits of the sampling system, or fluctuations in the actual “true rating”.

 Which leads me, intuitively, to the conclusion that we should not try to follow the high-frequency fluctuations at all, and that an approach where we try to keep the rating estimate stable at our best guess of the current “center of gravitation of the wave” is just more likely to be, on average, a good estimate of the “true rating”.



 Part 2: Why I’m skeptical of instant ratings

TL;DR: for the above reason, I believe they're bound to have a low signal-to-noise ratio.

To be clear, what I said in the last paragraph is what I had been thinking the whole time, at least since I wrote the “schmating” thing. So if you’re thinking “That’s what I’ve been saying this whole time!”, well, it just means you somehow didn’t realize I already agreed with this :laughing:

 I feel this might be another application of the bias-variance tradeoff principle (in fact, it seems to be essentially what @joooom has said multiple times): in such a system, we would be sacrificing the hope of ever accurately capturing the fluctuations of the “true rating”, but we would have less of a risk of being oversensitive to the noise caused by sources of experimental error (such as the intransitivity of player’s ability).

  •  This is the reason I’m highly skeptical of instant ratings: a rating system that updates at every game, with no memory of the shape of the apparent rating fluctuations, feels to me like it’s doomed to either be oversensitive to the noise, likely leading to pointless swirling around until it gets so stupidly far from the actual “true rating” that it’s forced to move back, or to be too slow to notice any big-picture trends in the rating.

And so in the end this is why I have a feeling that reducing volatility might actually end up improving the overall “goodness” of the system.

Then again, as was just pointed out, different metrics of “goodness” might give dramatically different “evaluations” to the same rating system.

 For example, going back to what stone_defender said, it might be better user experience for the system to catch on quickly to the player being in a slump, to help them lose less games in the moment – even if the only way to have that quick reactivity is for the system to also latch on to random noise?


Well, that’s it for now. I had other interesting thoughts, but I need time to organize them in my head, and this message is already long enough :sweat_smile: