Regarding normal distributions:
In my opinion, normal distributions do have more going for them than their mathematical convenience and the fact that they’ve been well studied. It’s not just an arbitrary choice of bell curve: the Central Limit Theorem says that, if a larger random effect is the result of a sum of small, independent effects, then it will specifically look like a normal distribution. So there is often theoretical justification for choosing normal distribution over other bell curves, although I agree it’s sometimes overstated.
Also, I think there is something to the mathematical convenience, beyond just “it’s convenient.” It’s a simplifying assumption that discards higher order effects when they might not be relevant (or when they might be smaller than other sources of error). Kind of like an engineer or physicist assuming deciding to treat their mechanical springs as ideal, linear springs. The log-likelihood of a normal distribution (our good friend!) is a parabola—a quadratic curve. So deciding to approximate a distribution as normal is akin to deciding to approximate a more complicated function with a simple, low-degree polynomial. This approximation will be better if the data points you are interested in are closer to the mean, rather than out near the tails. (If you’ve studied calculus, remember that there is a formal basis for this approximation via Taylor’s Theorem.)
So, IMO, there’s good reason to treat the normal distribution as distinguished among the other bell curves, not just a kind of intellectual laziness.
Regarding Glicko's assumptions:
Meili_yinhua’s description of Glicko(-2)'s use of the normal/Gaussian distribution agrees with my understanding. I’d like to clarify a few points further, because I think it might be enlightening.
Glicko and similar systems, as far as I understand them, use the normal/Gaussian distribution in two different ways. First, the underlying model assumes that player’s strengths vary randomly following a Brownian motion pattern, which is like a continuous-time version of a normal distribution. Again, this is an assumption of the underlying model; the system assumes that world actually looks like this (or at least, close enough that the assumption will yield useful results).
Second, it makes a computational simplification by approximating the information we have about a player’s strength at any point in time by a normal/Gaussian distribution. True Bayesian inference from game results yields a complicated, awful distribution that, as far as I know, can’t really be parameterized with a fixed number of parameters. The system summarizes this information with a kind of best-fit normal/Gaussian distribution, and discards the additional information. This discarding of information causes, for example, the phenomenon we discussed earlier where a new player can lose to 1d players and have their humble rank rise.
The “interpolation” you talk about, @espoojaram, is the Bayesian inference that meli_yinhua is talking about, combined with the information-discarding summary of the resulting information as a normal/Gaussian distribution.
This isn’t right. The Brownian motion is Gaussian, but it is trying to model how a player’s innate strength changes over time. The model assumes that game-to-game fluctuations in perfomance, in the face of an opponent facing similar fluctuations, follow a sort of logistic distribution: the probability of winning is a Logistic function of the difference in strengths. When two players have a particular difference in strengths, the model assumes that this means that the game outcome has a particular probability, taking into account that the skill a player showcases on any given day is random. The thing that’s moving via Brownian motion is the probability of a having a good day, not how good a day they’re having, so to speak.
The rating deviation is an estimate of our uncertainty about an player’s underlying strength, not our uncertainty about how they will perform in a given game. There is no separate number for that uncertainty—it’s all contained in the single number we call “strength.”
My wikipedia-based knowledge of the history of rating systems is that Elo’s original model assumed that the “strength a player might showcase on any given day” followed a normal/Gaussian distribution, too, but this proved not to be a good fit for the data compared to the Logistic function.