Anyone else notice something odd and unusual with Dwyrin's match against Crazy Stone Deep Learning?!

davidye · October 20, 2016, 12:20am

Anyone else notice something odd and unusual with Dwyrin’s match against Crazy Stone Deep Learning?!

Upon watching Dwyrin’s game against Crazy Stone 2016 Deep Learning Edition recently, I came across something really odd and unusual. Actually many things weird and strange. The video in question is called “Bats Go Lecture - Crazy Stone 7d Review!!” and it is posted on his YouTube channel from Jun 2, 2016.

Since I myself also have Crazy Stone 2016 Deep Learning Edition I decided to follow along. I don’t believe, and I didn’t see, that Dwyrin had actually posted or also published the accompanying .sgf file anywhere. Interestingly, unlike Haylee’s earlier match with Crazy Stone back in May 2016, it didn’t appear that he actually lived stream this game nor did he actually play the game or even a recording of the game from Crazy Stone at all. He appears to be re-enacting it from a record of some sort, but it isn’t apparent how he is doing it, but what is apparent is that he is using the Goban client from KGS and not playing nor reviewing the game nor the record of the game from the CrazyStone itself.

So, since I wanted a sgf of it, I had to follow along manually and essentially re-created the game record myself based upon what the moves and positions that Dwyrin showed in his YouTube video.

I soon encountered many discrepancies and outright contradictions that seemed very odd, strange and unusual and that which I felt could not be reasonably explained away:

Many times when Dwyrin stated that CS made a certain move, when I followed along, CS did not actually make that move at all.
Other times Dwyrin claimed CS would act very weirdly and play in one area of the board, only to then jump to an entirely different area, and then subsequently come back to the original initial area after some moves later. When I actually played out the same sequences in CS, I noticed that this never happened for me.
Several times at various stages throughout the game, Dwyrin noted that CS evaluated his position (Dwyrin played black) as very unfavorable and indeed went so far as to mention when he was viewing the bottom “bar” showing which side was playing better/stronger that the CS had evaluated that Dwyrin essentially had no chance to win and that White (the computer) had all but won. This however did not match what I observed at all, contrary to what Dwyrin claimed, every instance when I actually played along exactly move per move with his video, when and where he stated the game’s evaluation bar was heavily in favor of the AI and against Dwyrin the player, when I actually followed along it was always close to 50/50, and never tilted or leaned heavily towards one way or another. I tested this in both Crazy Stone 2016 Deep Learning Edition 1 Version 1.00 and Crazy Stone 2016 Deep Learning Edition 1 Version 1.01 and results were consistent, and it was not at all as what Dwyrin reported.
At several points, Dwyrin would remark that CS made the evaluation mistake of thinking that one of his (Dwyrin’s) black groups were dead when it was actually still alive. But when I checked CS’s own territory analysis, it always reported those groups as still alive and never marked them as ‘dead’ or being ‘white’. Indeed, at one point in time at a critical junction point Dwyrin stated that it wasn’t until immediately after Dwyrin made one critical move that all of a sudden CS realized that it had all along mistaken one of Dwyrin’s black groups to be dead when it was actually alive. However my analysis clearly showed that even dozens of moves prior to that point CS already and always evaluated Dwyrin’s black group as being alive and at no point in time did it think it was dead. Additionally, at this exact point in the play, at and right after the same aforementioned critical move, Dwyrin stated that it suddenly shifted CS’ evaluation bar from heavily tilted in favor of the AI/CS to swinging it heavily in favor towards black, in of Dwyrin the player. However when I followed along, absolutely no such swing ever occurred at all! It was self-evidence to me that what Dwyrin described did not at all reflect the reality of the situation when I did an exact test.
**(Showing white/black bar status indicator and compared to Darwin’s claims at exact same board configuration)

How I did the test to know:
*Where would the program actually play

I use the “hint” function to see where it would recommend to play via hints
I bring up the analysis window and have it run a detailed analysis
I have the program jump right in to see where it would actually play and where it actually does/did play.

*Capture Screenshot of action.
*Show projected territory action (so if game thinks it is alive or dead group etc).
*Show hint action (what is game’s next hint place).
*show analysis ranking action, list of best recommended moves with list stats and alternative options that CS considers.

Anyone, even those without CrazyStone, can view the .sgf file that I uploaded to see that it matches identically with Dwyrin’s published video.

Those with CrazyStone Deep Learning Edition will be able to additionally follow along with the .sgf file and confirm that the aforementioned that I posted above have merit and validity.

In the interest of accountability and transparency and out of an abundance of caution, I have taken the liberty of archiving the same YouTube video using KeepVid and hashed it with SHA checksums whilst retaining a duplicate on Internet Archive.

Dwyrin is considered by many to be a prominent figure in the Go/Baduk world, and he appears to be a very strong player, so for him to do a review about Crazy Stone 2016 Deep Learning Edition and be so wrong about his analysis and conclusions is worrying and concerning. He specifically stated that while he finds some value in the Crazy Stone 2016 Deep Learning Edition program, he does not necessarily trust the program’s ability to judge alive/dead groups nor its accuracy of its positional/evaluation or judgments and predictions of who is ahead. My major point of contention with that is he is formulating such opinions from completely unfounded basis. I have reason to believe and certainly the empirical evidence has shown that Dwyrin was entirely wrong and incorrect in much if not all of his review of CS in the game that he purported to have played against Crazy Stone on “unlimited” time. His mistakes whether inadvertent or even if made in good faith, undermine his review of CS 2016 and certainly calls into question his judgment of CS’s ability to make correct evaluations, strength of play, etc.

http://pastebin.com/RbjdizEX

https://is.gd/BatsCrazyCrazyStoneReview

http://eidogo.com/#DeJh717D - game uploaded to online Go viewer

http://imgur.com/a/dnUud - album of screenshots

Time Index 08:10
“After this variation was played, White was very much leading in Crazy Stone’s opinion”. “That the game was very very far in White’s favor”.

Time Index 15:40
“I’m almost positive that it thinks that I’m dead on the left hand side of the board. It thinks that all of my stones are dead. Because if we go back to the application, at this point in the game this little white bar was like way over into here, I think it only imagined like I only had a five percent chance of doing anything in this game”

Time Index 22:30
Anyway so I played this, [White plays at E9] and he plays there, I connect cuz I don’t want to die, and the minute he played [Crazy Stone played Black C9] these variations were played, that bar that I had been referring to, on and off throughout this game, drastically swung in my favor. Like up until this point, it was clear that it was evaluating this incorrectly, that black was supposed to be dead." … “And when this happened [C9] it drastically changed…”
One of the weakness that I heard about the program, in that it can judge a life and death problem incorrectly, and then when it does, you have a serious problem because it doesn’t try anything too hard, but yeah judging the problem incorrectly is a huge huge problem that this thing does have.

weston · November 21, 2016, 5:09pm

@davidye I can see you spent a good amount of time on this, and it’s all very interesting! I don’t have any version of Crazy Stone, but I have been casually following Dwyrin’s YouTube channel for a few years, and supporting him via Patreon. I don’t doubt anything you reported in this post, but what would motivate someone to fictitiously play out a game and falsely attribute it to a bot? Placing trust in both your post and Dwyrin’s videos, I can only hypothesize one of the following may be true:
(a) When seeing where the program would actually play, perhaps the “hint” function does not reveal every possible move the program might play. For example, maybe there is some randomization involved in its decisions. Consequently, if this is the case it would also effect how the bottom bar estimates score (like how the score estimator here on OGS can give different estimates for the same whole board position).
(b) Perhaps Dwyrin was using an older version of Crazy Stone, like a beta version for example.

muzzy1 · November 22, 2016, 7:34pm

I don’t have crazy stone but most computer players will randomize between two roughly equally good moves each game, so that they don’t repeat themselves. Further, if the computation amount (time*speed) if different the evaluation will be different. Does it always play the same move if you play Q16 as move 1? I doubt it.

And the latter might make the evaluations completely different. How long did you give CS to evaluate? The same amount of time as he did exactly? Same computer? As it evaluates we could imagine an evaluation of a position going from good → bad → good as the reading gets deeper.

Further are the versions the same? Even a small point release could correct some obvious defect that showed up in this video or elsewhere.