Testing the Volatility: Summary

Ok, I’ve been writing this for a long time, so keep in mind it’s based on your previous reply in the other thread.

I’ll expand on your idea to describe something I’ve been thinking of and hopefully circle back to the volatility question.

I don’t necessarily think this is the best way to make my final point, but it’s definitely the fun one for me.

So I present to you:

“Go Strength” is Complicated, and Why That Might Not Matter

 The latent space and how to study it

I conceptualize "Go strength" in a very similar way as you seem to,

with the difference that I believe there is potentially a way to capture it mathematically.

 It’s called a “latent space”, and in simple term it’s simply a multidimensional vector space, where each dimension is a parameter.

 I don’t believe we can really know what those parameters should be, but I’ll use something similar to the example you gave for the sake of having a concrete idea to hold on to: say one parameter is “reading ability”, another is “direction of play instinct”, another is “joseki knowledge”, and so on, say we have k parameters.

 Now, we have a population of players, and say we have a magical black box that can meaningfully measure those parameters for each player, thus assigning to each player a point in a k-dimensional space.

 We can visualize that as a cloud of points, like a scatter plot, but instead of just x and y, we have k dimensions. But since it’s probably just a blob, it doesn’t really matter.

image

Now consider a mathematical function that is able to use all of those parameters to estimate, for each point, the probability that a player with those parameters will win against another player with some arbitrary parameters.

Thus we now have, for each point in this k-dimensional space, a function that goes from a k-dimensional space to the real interval [0,1], which you can visualize as assigning a color to each point of the k-dimensional space.

image
Like this, but there’s a different picture for every point of the space

 Now let’s apply a simplistic idea: for each player/point, calculate the k-measure (the “k-dimensional volume”) of the subset of “all the players this player has more than 50% chance of winning against” (or technically of a continuous approximation of those sets of points, whatever), and divide that by the “volume” of the set of the entire population of players (continuous yadda yadda).

Now you have a simpler function, that assigns a number between [0,1] to each point (it might look similar to the fog of the second image though).

Take your finite population of players and compile a histogram based on those numbers.

Call the x coordinate of that histogram a “schmating” and repeat after me: the schmating is a meaningful quantity.

image

Now suppose I told you two players have the same schmating at a certain point in time.

They could be in different points in the cloud, and if you just knew where, you would be able to get a good estimate of the specific probability of the outcome of their match at that point in time.

 But without knowing that, and just knowing their schmating, you would have to calculate the probability by calculating some kind of weighted average of all the possible pairs of points in the cloud that have that schmating. And for every pair you consider, you don’t know which of the two each player is, so by symmetry I believe the result of the sum would be exactly 50%.

 Now if I told you two players have a different schmating, we don’t know how difficult it would be to calculate the expected probability of one winning, but I strongly expect that the probability of one randomly selected player winning against a fixed selected one would follow some kind of remarkably smooth sigmoid curve as a function of the difference in their schmating, touching 50% at 0.

But of course all of this is much too complicated to calculate in practice, so what use is that for us?

 A schmero comes to save the day

One peaceful morning,

the protagonist of our story, Mr. Schmarpad Schmelo, notices a remarkable thing about this system while schmoking his pipe: you can actually perform experiments to measure the schmating even without having any information about the k-dimensional space.

 You can pair a bunch of players together, and with statistical tools estimate their probability of winning against all the other players in the population, and from that estimate the schmating. Let’s call this estimate of the schmating a “rating”.

Then Mr. Schmelo recognizes that there are many things affecting (1) the schmating itself and (2) the accuracy of our estimate for it, aka the rating.

 As a player learns new things or some of their skills increase, the schmating also increases. Also, changes to their mood or just a momentary spur of creative inspiration can arguably make the schmating fluctuate, not only game by game, but move by move.
 And also, obviously, if the whole population gets “better” at playing but one player doesn’t, that one player’s schmating will decrease.

But still, those fluctuations should oscillate around a center, at any period of time, and without having enough information, it makes sense that the meaningful measure you want is that center of gravitation, because on average it gives you the most accurate prediction of their winrate relative to the schmating distribution.

 A player can play uncharacteristically good moves in a game (here by “good” we mean in terms of a hypothetical system that can calculate, say, the points lost or gained for any move), but on average those will be balanced out by some uncharacteristically bad moves.
 And even if they don’t within a game, the games where they play more good moves will on average balance out with the games where they play more bad moves, in the long term.

 Since you can’t expect to predict those fluctuations, the measure you really want is of the “center of gravitation” around which the schmating moves: that should give you the best estimate for the expected winrate relative to the schmating distribution (or the rating distribution).

 Also, reasonably speaking, you can’t perform enough measurements to be sure of the schmating, and “which other players a player is matched against” heavily affects how much information you have about it, so if you rely too much on the apparent info given by a single game result, you will easily end up making the rating fluctuate wildly, while the schmating probably only fluctuates around a very smooth and stable line over time (except when the player improves suddenly, say by learning a new joseki, which will likely make their schmating ramp up very fast).

So you develop a statistical model that tries to be flexible and take all of these uncertainties into account, but ideally, the objective of the system will be to guess the current “center of gravitation” as stably as possible, or just generally to have the best, on average, estimate of the expected outcome of any game based on the knowledge of the rating only.

 Then of course other people, such as Mr. Schmlickman, can try to build different measuring models, and then by performing experiments you could find out which one is the best, although the model needs to be suitable for that (so there needs to be a meaningful concept of “expected probability of winning” that you can calculate from the rating).

 So in conclusion:

 I don’t think it really matters that the “true nature of the Go strength of a human” is a complicated multidimensional monster: in the end, it is possible to compress some of that information in a one-dimensional spectrum in a meaningful way that can also be measured,

and in the end, I believe that the rating being stable and less susceptible to fluctuations makes for better winrate predictions, which is how you get better matchmaking – although this is a hypothesis that needs to and can be tested.

2 Likes