Ok, Iâve been writing this for a long time, so keep in mind itâs based on your previous reply in the other thread.
Iâll expand on your idea to describe something Iâve been thinking of and hopefully circle back to the volatility question.
I donât necessarily think this is the best way to make my final point, but itâs definitely the fun one for me.
So I present to you:
âGo Strengthâ is Complicated, and Why That Might Not Matter
â
âThe latent space and how to study it
I conceptualize "Go strength" in a very similar way as you seem to,
with the difference that I believe there is potentially a way to capture it mathematically.
âItâs called a âlatent spaceâ, and in simple term itâs simply a multidimensional vector space, where each dimension is a parameter.
âI donât believe we can really know what those parameters should be, but Iâll use something similar to the example you gave for the sake of having a concrete idea to hold on to: say one parameter is âreading abilityâ, another is âdirection of play instinctâ, another is âjoseki knowledgeâ, and so on, say we have k parameters.
âNow, we have a population of players, and say we have a magical black box that can meaningfully measure those parameters for each player, thus assigning to each player a point in a k-dimensional space.
âWe can visualize that as a cloud of points, like a scatter plot, but instead of just x and y, we have k dimensions. But since itâs probably just a blob, it doesnât really matter.

Now consider a mathematical function that is able to use all of those parameters to estimate, for each point, the probability that a player with those parameters will win against another player with some arbitrary parameters.
Thus we now have, for each point in this k-dimensional space, a function that goes from a k-dimensional space to the real interval [0,1], which you can visualize as assigning a color to each point of the k-dimensional space.

Like this, but thereâs a different picture for every point of the space
âNow letâs apply a simplistic idea: for each player/point, calculate the k-measure (the âk-dimensional volumeâ) of the subset of âall the players this player has more than 50% chance of winning againstâ (or technically of a continuous approximation of those sets of points, whatever), and divide that by the âvolumeâ of the set of the entire population of players (continuous yadda yadda).
Now you have a simpler function, that assigns a number between [0,1] to each point (it might look similar to the fog of the second image though).
Take your finite population of players and compile a histogram based on those numbers.
Call the x coordinate of that histogram a âschmatingâ and repeat after me: the schmating is a meaningful quantity.

Now suppose I told you two players have the same schmating at a certain point in time.
They could be in different points in the cloud, and if you just knew where, you would be able to get a good estimate of the specific probability of the outcome of their match at that point in time.
âBut without knowing that, and just knowing their schmating, you would have to calculate the probability by calculating some kind of weighted average of all the possible pairs of points in the cloud that have that schmating. And for every pair you consider, you donât know which of the two each player is, so by symmetry I believe the result of the sum would be exactly 50%.
âNow if I told you two players have a different schmating, we donât know how difficult it would be to calculate the expected probability of one winning, but I strongly expect that the probability of one randomly selected player winning against a fixed selected one would follow some kind of remarkably smooth sigmoid curve as a function of the difference in their schmating, touching 50% at 0.
But of course all of this is much too complicated to calculate in practice, so what use is that for us?
âA schmero comes to save the day
One peaceful morning,
the protagonist of our story, Mr. Schmarpad Schmelo, notices a remarkable thing about this system while schmoking his pipe: you can actually perform experiments to measure the schmating even without having any information about the k-dimensional space.
âYou can pair a bunch of players together, and with statistical tools estimate their probability of winning against all the other players in the population, and from that estimate the schmating. Letâs call this estimate of the schmating a âratingâ.
Then Mr. Schmelo recognizes that there are many things affecting (1) the schmating itself and (2) the accuracy of our estimate for it, aka the rating.
âAs a player learns new things or some of their skills increase, the schmating also increases. Also, changes to their mood or just a momentary spur of creative inspiration can arguably make the schmating fluctuate, not only game by game, but move by move.
âAnd also, obviously, if the whole population gets âbetterâ at playing but one player doesnât, that one playerâs schmating will decrease.
But still, those fluctuations should oscillate around a center, at any period of time, and without having enough information, it makes sense that the meaningful measure you want is that center of gravitation, because on average it gives you the most accurate prediction of their winrate relative to the schmating distribution.
âA player can play uncharacteristically good moves in a game (here by âgoodâ we mean in terms of a hypothetical system that can calculate, say, the points lost or gained for any move), but on average those will be balanced out by some uncharacteristically bad moves.
âAnd even if they donât within a game, the games where they play more good moves will on average balance out with the games where they play more bad moves, in the long term.
âSince you canât expect to predict those fluctuations, the measure you really want is of the âcenter of gravitationâ around which the schmating moves: that should give you the best estimate for the expected winrate relative to the schmating distribution (or the rating distribution).
âAlso, reasonably speaking, you canât perform enough measurements to be sure of the schmating, and âwhich other players a player is matched againstâ heavily affects how much information you have about it, so if you rely too much on the apparent info given by a single game result, you will easily end up making the rating fluctuate wildly, while the schmating probably only fluctuates around a very smooth and stable line over time (except when the player improves suddenly, say by learning a new joseki, which will likely make their schmating ramp up very fast).
So you develop a statistical model that tries to be flexible and take all of these uncertainties into account, but ideally, the objective of the system will be to guess the current âcenter of gravitationâ as stably as possible, or just generally to have the best, on average, estimate of the expected outcome of any game based on the knowledge of the rating only.
âThen of course other people, such as Mr. Schmlickman, can try to build different measuring models, and then by performing experiments you could find out which one is the best, although the model needs to be suitable for that (so there needs to be a meaningful concept of âexpected probability of winningâ that you can calculate from the rating).
âSo in conclusion:
âI donât think it really matters that the âtrue nature of the Go strength of a humanâ is a complicated multidimensional monster: in the end, it is possible to compress some of that information in a one-dimensional spectrum in a meaningful way that can also be measured,
and in the end, I believe that the rating being stable and less susceptible to fluctuations makes for better winrate predictions, which is how you get better matchmaking â although this is a hypothesis that needs to and can be tested.