Mmmm, I wouldn’t ever consider this to be a property that can trusted to be highly reliably true, no matter how many visits of search, at least not until we nearly solve the game. (And 19x19 probably won’t be solved in the foreseeable future)
Suppose in the extreme that it were always true.
Consider running analysis on every move until the end of the game always choosing the bot’s preferred move. For it to be the case the predicted score on each move is the same as the score on the next move, that means the predicted score must be the same on every move. So that would imply that the bot can correctly predict the end-of-game score against an equally strong opponent (i.e. itself) from any point earlier in the game. If the bot were to play itself at the komi it believes to be fair for any position, it would draw 100% of games.
That doesn’t quite imply that the bot is optimal (after all it doesn’t rule out blind spots that would get an even better result than the bot’s move), but it still does mean that the bot despite searching many moves with a bunch of visits on every turn, can’t find any improvements on its own, and at least among the moves it considers, is never surprised to find anything be better or worse than expected. That’s quite beyond where current bots are on 19x19, in practice we likely won’t see this kind of behavior until bots are much closer to optimal.
Okay, sure, that’s an extreme. It can still be decently reliable for a bot that the score on successive moves will equal if you follow the “blue move”, obviously we’re not expecting that it’s literally always true.
But the game only lasts 200-300 moves, and the self-play draw rate isn’t so high yet - we would expect a good fraction of games to have at least one or two surprises and swings, and even some of the drawn games can have also have surprises that cancel out.
That means that at a bare minimum we might expect e.g. on the order of magnitude of 1% (perhaps more in practice) of positions to have the bot be surprised and/or for scores to mismatch across a move… and even 1% is already enough that across all the games people are playing and analyzing every day, moves where the bot is surprised or misjudges something should therefore actually be pretty commonplace for people to find. Even if high-visit searches are used.
And… that’s kind of not too far from what we see, as far as I can tell. At least, it’s good to recognize that properties like reliable move-to-move score consistency can be a really big and difficult ask, far more than one might think at first.