After how many moves does a game become unique (statistically)?

gennan · March 22, 2021, 11:06pm

I think that’s a reasonable estimate. For an accurate nummer, you’d really have to analyse all the games of a large pro game database.

meili_yinhua · March 22, 2021, 11:35pm

a similarly dumb estimate (but with more math and spreadsheets!) based on my model above but only calculating the shannon entropy of the empty board position (as opposed to that of various positions), I get results that look a bit like this

for 0.05 < p < 0.95, the 15th move is the first to have > 5% chance of being unique, and the 17th is the last to have < 95% chance

for 0.01 < p < 0.99, the 12th is the first to have > 1% chance of being unique, and the 19th is the last to have < 99% chance

for 0.001 < p < 0.999, the 11th is the first to have > 0.1% chance of being unique and the 22nd is the last to have < 99.9% chance.

all the math is in the spreadsheet is here: entropy of empty board - Google Sheets

it seems to suggest a much smaller window than yours, but it is only taken on one data point, and seems to be within the ballpark of possibility (altho the distribution tapers too fast to account for the 78 move pair of games)

gennan · March 22, 2021, 11:57pm

That Shannon Entropy seems to be similar to what I loosely called “branching factor”.
The empty board in Waltheri’s seems to have a relatively large “branching factor” of 2 (at least for pro games).
If you assume this continues to be the case in the whole opening, naturally you end up near my lower estimate of 16 moves for pro game uniqueness.
But when I look further info the openings in Waltheri’s, the branching factor seems to decrease after a couple of moves. Joseki are closer to one way streets, reducing the branching factor and thus extending the distance to uniqueness.

meili_yinhua · March 23, 2021, 12:03am

This is possible, that future moves have lower information: as players are not independent and two who go down one path are more likely to continue down the same path. I’d hypothesize there could be a linear correlation between the information at a certain position and its depth into the tree

gennan · March 23, 2021, 12:12am

Besides this Entropy, we also have the concept of Temperature of a go position. We might discover some thermodynamical laws about the game of go

Kosh · March 23, 2021, 12:46am

Yeah, dah… Go is HOT.

meili_yinhua · March 23, 2021, 12:48am

so far to follow up I’ve gone through the top 10 branches after move 1 and come up with an average entropy for move 2 (second page of the sheet linked earlier) at the high number of 2.17! (for comparison, the empty board had an entropy of 1.19), naturally this does not rule out the idea of “negative slope linear relationship” as this is likely due to the removal of all the symmetries being grouped together. 2.17 corresponds to a branching of 4.5, and 1.19 to that of 2.28, which is just over double, which considering there are 4-8 times as many options on move 2 does indicate a good likelihood of a decreasing relationship once these symmetries are of lesser importance.

Samraku · March 23, 2021, 1:33am

Can a game of Go be reconstructed from the final position?

Multi-vac: There is of yet insufficient information for an answer.

Kosh · March 23, 2021, 1:47am

I would guess confidently, ‘Not’. There can easily be endgame moves that are completely interchangeable with no strategic consequences.

meili_yinhua · March 23, 2021, 3:24am

Same with opening moves, I’ve gone through many positions in the opening that had games transposed into each other

gennan · March 23, 2021, 6:48am

Once each corner has a stone or an enclosure, I expect that symmetry will usually be broken and it also becomes more likely that joseki will appear. By move 15 I expect that most pro games will have at least 1 joseki on the board (or are in the process of playing a joseki if it is a long one).

Samraku · March 23, 2021, 10:08am

It was supposed to be a reference to Isaac Asimov’s short story “The Last Question” where the titular question is “Can entropy be reversed?”, so the impossibility is deliberate.

meili_yinhua · March 24, 2021, 11:37am

To follow up again, I’ve now gotten 10 large samples of the tree up to positions 5 move deep (in the same spreadsheet, page labelled move 6 as that is it refers to the entropy of the 6th move),

The entropies of each move (and N, referring to the amount of games in the database that pass through sampled nodes) respectively is
move 1 (empty board) = 1.185864003 (N=79 788)
move 2 = 2.167951807 (N = 79 755)
move 3 = 2.098743995 (N = 73 082)
move 4 = 1.72368469 (N = 53 732)
move 5 = 2.222664175 (N = 41 526)
move 6 = 2.223226401 (N = 23 108)

There seem to not be many signs of a decrease (other than move 4 lowering in entropy), but no forced sequences have been experienced, so there’s still hope for the theory of it decreasing.

(also note: a lot of unique games have already been spotted in these sequences, if someone wants to collect that data to estimate the number of unique games at each depth, that is very likely to also be an effective means)

Maharani · March 25, 2021, 8:12pm

Cool that this thread is active again.

I wonder if the amount of identical opening positions increases if we ignore move order? I’m still kind of baffled that games seem to usually become unique after at most 20-odd moves, given how common a lot of joseki are.

GreenAsJade · March 25, 2021, 10:12pm

There’s nothing like making some gut-feel observations about statistics to get real statisticians moving to prove you wrong

Kosh · March 25, 2021, 10:16pm

Yep. Works nine times out of ten.

meili_yinhua · March 26, 2021, 3:31am

part of it is based on sample size, part of it is that there are a lot of joseki to choose from, and pros are the people who know more joseki than anyone, part of it is there are so many good options even before joseki come into play, and part of it is that the popular joseki keep changing even among pros.

(also you could argue the set of pro games in waltheri is not all that massive, and it’s kinda to be expected if you go by things like branching factors given the size of the data set)

bugcat · March 26, 2021, 6:20am

the set of pro games in waltheri is not all that massive

I think Waltheri has about half of the surviving public professional kifu, possibly more like 60%.

Waltheri has around 80,000 games. I read that GoGoD has something like 130,000, but I have a feeling that that post was written around a decade ago.

meili_yinhua · March 26, 2021, 7:04am

don’t get me wrong, 80K is definitely a large amount, even for a statistical study, but in comparison to chess, that’s about half the amount of recorded Master games that start with the fourth most popular move (1.c4) in the Lichess database (the total amount is over 2 million).

And probably there are at least a couple hundred games of go being played on ama servers every day… OGS alone has serialized IDs that (granted are not all complete games of standard size and without handicap) exceed 31 million. If we assume that even 1/100th of those games are within those parameters, that’s already 3.7 times more games than in the waltheris database (which has records spanning several centuries), and that’s just one server across a decade or two.

Like, don’t get me wrong, it’s probably an OK estimate for pro games as a whole (if a little bit of an underestimate), but when you’re talking uniqueness, the size of the set you’re working with matters, as this will probably tell us very little about what the same results look like for amateur games.

bugcat · July 12, 2021, 9:25pm

I was discussing this with mekriff on twitch just now, so I decided to make some dumb estimates for more liberal parameters as well.

Let T(ultrahigh) be 1,000,000,000 (a trillion) games of Go ever played by humans.

Let P(superlow) be a branching factor of 1.75 and P(ultralow) one of 1.5.

Now let’s reproduce the table with the addition of these new values.

T	P	D	Equation
low	ultralow	move 42	1.5 ^ 52 > 1b
low	superlow	move 38	1.75 ^ 38 > 1b
low	low	move 30	2 ^ 30 > 1b
low	mid	move 19	3 ^ 19 > 1b
low	high	move 14	5 ^ 14 > 1b
—
mid	ultralow	move 57	1.5 ^ 57 > 10b
mid	superlow	move 38	1.75 ^ 38 > 10b
mid	low	move 34	2 ^ 34 > 10b
mid	mid	move 21	3 ^ 21 > 10b
mid	high	move 15	5 ^ 15 > 10b
–-
high	ultralow	move 63	1.5 ^ 63 > 100b
high	superlow	move 46	1.5 ^ 46 < 100b
high	low	move 37	2 ^ 37 > 100b
high	mid	move 24	3 ^ 24 > 100b
high	high	move 16	3 x 16 > 100b
–-
ultrahigh	ultralow	move 69	1.5 ^ 69 > 1T
ultrahigh	superlow	move 50	1.75 ^ 50 > 1T
ultrahigh	low	move 40	2 ^ 40 > 1T
ultrahigh	mid	move 26	3 ^ 26 > 1T
ultrahigh	high	move 18	5 ^ 18 > 1T

Is it possible to get an idea of the most accurate value of T?

Suppose that there are 1 million people each playing 100 games / year and that the main period is 100 years. That would be 10b games, what I’ve called T(mid).

I suspect that T(mid) is the most accurate, but ofc it’s a matter of debate.

As for branching factor, I’m not sure a game with branching factor of less than 2 in the first fifty moves is any any game worth playing at all. My intuition is that P(mid) or even P(high) is the better model.

low, mid = 14 moves
high, low = 37 moves

I still consider these the most reasonable boundaries.