After how many moves does a game become unique (statistically)?

I think that’s a reasonable estimate. For an accurate nummer, you’d really have to analyse all the games of a large pro game database.

1 Like

a similarly dumb estimate (but with more math and spreadsheets!) based on my model above but only calculating the shannon entropy of the empty board position (as opposed to that of various positions), I get results that look a bit like this

for 0.05 < p < 0.95, the 15th move is the first to have > 5% chance of being unique, and the 17th is the last to have < 95% chance

for 0.01 < p < 0.99, the 12th is the first to have > 1% chance of being unique, and the 19th is the last to have < 99% chance

for 0.001 < p < 0.999, the 11th is the first to have > 0.1% chance of being unique and the 22nd is the last to have < 99.9% chance.

all the math is in the spreadsheet is here: entropy of empty board - Google Sheets

it seems to suggest a much smaller window than yours, but it is only taken on one data point, and seems to be within the ballpark of possibility (altho the distribution tapers too fast to account for the 78 move pair of games)

1 Like

That Shannon Entropy seems to be similar to what I loosely called “branching factor”.
The empty board in Waltheri’s seems to have a relatively large “branching factor” of 2 (at least for pro games).
If you assume this continues to be the case in the whole opening, naturally you end up near my lower estimate of 16 moves for pro game uniqueness.
But when I look further info the openings in Waltheri’s, the branching factor seems to decrease after a couple of moves. Joseki are closer to one way streets, reducing the branching factor and thus extending the distance to uniqueness.

2 Likes

This is possible, that future moves have lower information: as players are not independent and two who go down one path are more likely to continue down the same path. I’d hypothesize there could be a linear correlation between the information at a certain position and its depth into the tree

1 Like

Besides this Entropy, we also have the concept of Temperature of a go position. We might discover some thermodynamical laws about the game of go :wink:

4 Likes

Yeah, dah… Go is HOT. :rofl:

7 Likes

so far to follow up I’ve gone through the top 10 branches after move 1 and come up with an average entropy for move 2 (second page of the sheet linked earlier) at the high number of 2.17! (for comparison, the empty board had an entropy of 1.19), naturally this does not rule out the idea of “negative slope linear relationship” as this is likely due to the removal of all the symmetries being grouped together. 2.17 corresponds to a branching of 4.5, and 1.19 to that of 2.28, which is just over double, which considering there are 4-8 times as many options on move 2 does indicate a good likelihood of a decreasing relationship once these symmetries are of lesser importance.

1 Like

Can a game of Go be reconstructed from the final position?

Multi-vac: There is of yet insufficient information for an answer.

3 Likes

I would guess confidently, ‘Not’. There can easily be endgame moves that are completely interchangeable with no strategic consequences.

3 Likes

Same with opening moves, I’ve gone through many positions in the opening that had games transposed into each other

3 Likes

Once each corner has a stone or an enclosure, I expect that symmetry will usually be broken and it also becomes more likely that joseki will appear. By move 15 I expect that most pro games will have at least 1 joseki on the board (or are in the process of playing a joseki if it is a long one).

It was supposed to be a reference to Isaac Asimov’s short story “The Last Question” where the titular question is “Can entropy be reversed?”, so the impossibility is deliberate.

1 Like

To follow up again, I’ve now gotten 10 large samples of the tree up to positions 5 move deep (in the same spreadsheet, page labelled move 6 as that is it refers to the entropy of the 6th move),

The entropies of each move (and N, referring to the amount of games in the database that pass through sampled nodes) respectively is
move 1 (empty board) = 1.185864003 (N=79 788)
move 2 = 2.167951807 (N = 79 755)
move 3 = 2.098743995 (N = 73 082)
move 4 = 1.72368469 (N = 53 732)
move 5 = 2.222664175 (N = 41 526)
move 6 = 2.223226401 (N = 23 108)

There seem to not be many signs of a decrease (other than move 4 lowering in entropy), but no forced sequences have been experienced, so there’s still hope for the theory of it decreasing.

(also note: a lot of unique games have already been spotted in these sequences, if someone wants to collect that data to estimate the number of unique games at each depth, that is very likely to also be an effective means)

4 Likes

Cool that this thread is active again. :slight_smile:

I wonder if the amount of identical opening positions increases if we ignore move order? I’m still kind of baffled that games seem to usually become unique after at most 20-odd moves, given how common a lot of joseki are.

3 Likes

There’s nothing like making some gut-feel observations about statistics to get real statisticians moving to prove you wrong :smiley:

6 Likes

Yep. Works nine times out of ten. :rofl:

2 Likes

part of it is based on sample size, part of it is that there are a lot of joseki to choose from, and pros are the people who know more joseki than anyone, part of it is there are so many good options even before joseki come into play, and part of it is that the popular joseki keep changing even among pros.

(also you could argue the set of pro games in waltheri is not all that massive, and it’s kinda to be expected if you go by things like branching factors given the size of the data set)

1 Like

the set of pro games in waltheri is not all that massive

I think Waltheri has about half of the surviving public professional kifu, possibly more like 60%.

Waltheri has around 80,000 games. I read that GoGoD has something like 130,000, but I have a feeling that that post was written around a decade ago.

don’t get me wrong, 80K is definitely a large amount, even for a statistical study, but in comparison to chess, that’s about half the amount of recorded Master games that start with the fourth most popular move (1.c4) in the Lichess database (the total amount is over 2 million).

And probably there are at least a couple hundred games of go being played on ama servers every day… OGS alone has serialized IDs that (granted are not all complete games of standard size and without handicap) exceed 31 million. If we assume that even 1/100th of those games are within those parameters, that’s already 3.7 times more games than in the waltheris database (which has records spanning several centuries), and that’s just one server across a decade or two.

Like, don’t get me wrong, it’s probably an OK estimate for pro games as a whole (if a little bit of an underestimate), but when you’re talking uniqueness, the size of the set you’re working with matters, as this will probably tell us very little about what the same results look like for amateur games.

4 Likes

I was discussing this with mekriff on twitch just now, so I decided to make some dumb estimates for more liberal parameters as well.

Let T(ultrahigh) be 1,000,000,000 (a trillion) games of Go ever played by humans.

Let P(superlow) be a branching factor of 1.75 and P(ultralow) one of 1.5.

Now let’s reproduce the table with the addition of these new values.

T P D Equation
low ultralow move 42 1.5 ^ 52 > 1b
low superlow move 38 1.75 ^ 38 > 1b
low low move 30 2 ^ 30 > 1b
low mid move 19 3 ^ 19 > 1b
low high move 14 5 ^ 14 > 1b
mid ultralow move 57 1.5 ^ 57 > 10b
mid superlow move 38 1.75 ^ 38 > 10b
mid low move 34 2 ^ 34 > 10b
mid mid move 21 3 ^ 21 > 10b
mid high move 15 5 ^ 15 > 10b
–-
high ultralow move 63 1.5 ^ 63 > 100b
high superlow move 46 1.5 ^ 46 < 100b
high low move 37 2 ^ 37 > 100b
high mid move 24 3 ^ 24 > 100b
high high move 16 3 x 16 > 100b
–-
ultrahigh ultralow move 69 1.5 ^ 69 > 1T
ultrahigh superlow move 50 1.75 ^ 50 > 1T
ultrahigh low move 40 2 ^ 40 > 1T
ultrahigh mid move 26 3 ^ 26 > 1T
ultrahigh high move 18 5 ^ 18 > 1T

Is it possible to get an idea of the most accurate value of T?

Suppose that there are 1 million people each playing 100 games / year and that the main period is 100 years. That would be 10b games, what I’ve called T(mid).

I suspect that T(mid) is the most accurate, but ofc it’s a matter of debate.

As for branching factor, I’m not sure a game with branching factor of less than 2 in the first fifty moves is any any game worth playing at all. My intuition is that P(mid) or even P(high) is the better model.

low, mid = 14 moves
high, low = 37 moves

I still consider these the most reasonable boundaries.

1 Like