Have you ever been thinking it sometimes feels unfair when a lower
ranked player beats you while you played as usual? Was he cheating?
Or is this the matter of different rank measurements across different Go
servers? Is there a way to get an absolute strength of your opponent
regardless of winrates by just evaluating quality of his moves to answer
the question: “is he really better than me and if so by how much?”
There are some serveys with rank comparisons available
online but those are not very helpful because they are essentially winrate
comparisons rather than actual strength comparison which might not reflect
the fact that you may sometimes struggle more against a “weaker” player
based on his rank but actually stronger than you. This might be very
confusing sometimes.
I had that confusion myself and came up with developing a tool to resolve this
once and forever. The idea behind the tool is very simple - based on my bare
Katago neural net in JS project (where you can play vs AI of 2 levels) - I
made a simple UI to get SGF pasted into the text area and an “Analyze” button.
The way it works is following - every move in the game gets evaluated by the
neural net and for each move a value of policy move number is assigned, e.g. you
made AI #1 move (blue spot) or AI move #138 (a very bad one), then the average
value is calculated which would be like: “your average move choice is move #20
and your opponent is say move #23”. It’s likely you won this game
but not obviously because it may happen you played better but made a crucial
mistake and lost a game. This game is bad from the winrate perspective but it
still might be valuable from the general performance you’ve shown.
Here’re some insights I’ve managed to extract from my games played on
different Go servers within last couple of weeks:
GAME 1, (OGS 14kyu) vs (OGS 11 kyu, me[cft7821g]) W+R
Average black performance is NN move #29
Average white performance is NN move #24
I had slightly better performance and won the game
GAME 2, (OGS 13kyu) vs Me W+4.5
Average black performance is NN move #22
Average white performance is NN move #18
Here performace difference is bigger but the win wasn’t as simple
GAME 3, (OGS 13kyu) vs Me B+R
Average black performance is NN move #24
Average white performance is NN move #25
Here I lost due to getting sucked into opponent’s weird moves
GAME 4, (Me, KGS 7kyu) vs (KGS 6kyu) B+33.5
Average black performance is NN move #11
Average white performance is NN move #10
I had slightly worse performance but crushingly won,
why my performance is much better on KGS???
GAME 5, (Me, playok ~1050) vs (playok ~1050) B+1.5
Average black performance is NN move #22
Average white performance is NN move #29
My performance is like on OGS, playok ~1050 is like OGS 13kyu
GAME 6, (Me, playok ~1050) vs (playok ~1000) W+R
Average black performance is NN move #20
Average white performance is NN move #10
Although I played better than usual, performance of my opponent
does not match his rank and his previous games, so it’s very
likely he was cheating
GAME 7, (OGS 4d) vs (OGS 6d) W+R
Average black performance is NN move #6
Average white performance is NN move #2
Summary:
Now when I played a game I can get an absolute value of the performance
me and my opponent has shown, regardless of which rank I have or he has
and on which server the game took place.
SGF game analyzer is free online tool, feel free to give it a try at:
How long does it take to analyze? Or does it require certain browsers, or extensions? When I click the Analyze it just hangs, and not responding, and I have to force close the window.
I tried both sites with a game played on my 1k account as white (I won against 3k). The website “how deep is your go” estimates my rank at 0.4k. The one mentioned by OP indicates
Average black performance is NN move #285
Average white performance is NN move #308
I appreciate the idea and its simplicity. Some thoughts:
If you open with unconventional openings, these will rank a lot lower, despite openings not mattering all that much. So the more conventional player will rank higher.
Move level probably better corresponds with move strength after the opening. Maybe you can optionally exclude the opening (X moves) from the overview.
In addition, all it takes to lose a game is a huge blunder. Some insight in how often a move was below an arbitrary level (bottom 50%?) might give an additional insight.
Of course the more you build out, the more it approaches the KaTrain strength overview: Three histograms of points lost per move - one for the beginning (first X moves), middle and endgame.
That would be ideal but I understand that’s not something you can offer for free.
I use chrome, when click on analyze it should print board on screen move by move, after finished prints the average NN move choice number. On my raspberry pi 5 it takes around 0.2 sec per move. This is a bit strange it doesn’t work for you, did you previously use my Go ? If so you need to unregister app in dev tools to clear cache and reload page with disabled cache flag under network tab. If you’re using it fir the first time then it’s strange, does webgl work for you on other websites?
You’re spot on. Surely, doing what katrain already better makes no sense. Regarding opening moves - it’s true but neither me nor my opponents play that so if one day that would be a problem maybe I add this feature to exclude opening moves, but still the contribution of opening moves is not that big, I tried Cho Chikun’s games from 1968 when he was 1p and they played vastly different from what is played today, still Cho Chikun’s game scored average NN move #4.
Regarding blunders - it perfectly fits to the idea of the average value - it contributes to lowering the overall performance.
Anyway, just to clarify the main reason why I made it and why I’m happy to use it - by getting the average value (which might not have much sense overall) I get a constant representation of the playing performance, so I know I played with quality #20 or #25 - this value would be the same across different servers, so I know whether I played like usual or a bit better, regardless of game result. Because it might happen I played badly but won due to opponent’s blunder or I played well but blundered and lost, so I’m interested in measuring the performance, not the result because the result is measured by ranking system.
I’ve been checking some FOX 1d games and the range is from #16 to #5.
Some notes to bare in mind:
SGF should contain only 19x19 games, since NN only supports those
Avoid SGFs with AI review or any other kind of comments
Do not treat the resulting value as a strict ranking, it’s more to evaluate the particular shape you were in during the game
Just to elaborate a bit. In Chess my lichess rank is around 1750 but sometimes if I get a desired type of position I can beat 2200 player in blitz, so my performance would be like 2200 but it doesn’t mean my rating is 2200, same here - you sometimes may play better than your rank, sometimes worse, sometimes if opponent plays weird moves I play weird moves in response because I don’t know what to do so the game scores lower than usual, so it’s a metrics for a current game. E.g. the way you play teaching game would score less than the competitive game for a title in the tournament.
App is not very robust and it’s for my personal needs only, but since I find it useful I thought sharing it would be useful. The NN is an old katago net itself, so the #1 is like OGS 3d maybe, no more, for 5d+ players the results might not make sense sometimes because they may make moves better than this net but the net thinks those moves are not as good, with a stronger/bigger net analysis would take much longer, but for kyu players below OGS 1 dan this should work well.
Finally, the way I use it is ti track the resulting value for my own games, e.g. if I play badly it’s like #25, once I lost only by 0.5 points to OGS 4kyu, my performance was #19. On the other hand OGS 4 kyu usually beats FOX 1 dan but FOX 1 dan game usually gets #16 to #13.
You may think of results like those stats in katrain when after making a move it tells you that your move is NN choice number N and best NN move is X. Now imagine the average N calculated like (move1+move2+…+moveN)/number_of_moves. I only need a stable metrics to track across different servers and differenbt rankings, so if opponent performs say #20 I know how it “feels” to play him despite his rank, again, I have 6 kyu on KGS and 11-10 kyu on OGS, but according to serveys KGS 6kyu is like OGS 8kyu and from the strength perspective that’s true, however not from the winrate perspective.
The way result is calculated is available in the source code:
I use Brave brower, and in dev mode, I see errors initiating WebGL. But a simple check for WebGL and showed they can work. After some digging into the settings I was finally able to run it (I have script extension to prevent some execution). The records I used are from IGS, and they did contain comments within the records, so I have to edit them myself to clear those before they can be loaded. (so it was a double issues for me, both loading the engine and the records themselves).
Now this makes perfect sense. It’s always a dilemma whether to share personal project or not because the I use it and the way others may want to use it is always different and the development philosophy for personal vs public projects would obviously have different priorities.
I also noticed it doesn’t work on mobile which is very strange because my PWA app allows to play against the NN as a android app in a smartphone perfectly well. I need to figure out what’s wrong with mobile version. If I make an update it would be posted here.
My comment wasn’t meant to be a critique though, just some technical issues I had that required clarification since I am curious to try it myself. And thanks for sharing and open source the project.
As for my testing results though, from your explanations of running a fast small network, I already assume it won’t be much of help at my high dan rank range, and my tests confirmed that even for professional games (and my games) would range from 3d to 7d, without much distinction. They are simply beyond the network’s scope I believe.
Yes, exactly. NN should be a few stones stronger than the players played the game to provide a meaningful analysis and even with latest katago nets - still MCTS with many playouts is used to properly analyze the game. In my case it might be treated as 1 playout, since there’s no MCTS and the net strength is around KGS 1 dan (should be OGS 2-1k). But for kyu players it might be very insightful.
UI has been improved, now user can select move range to analyze the game within (e.g. your performance in fuseki might be better than in tuban), also it should now work on mobile.
IMPORTANT: you may need to clear browser cache to make the latest updates available, you may also need to unregister service worker under devtools->Application, once done the board should now be rendered on HTML canvas, also move range selectors should appear.