Make Tsumego Great Again!

Alright let’s get right to business:

I suggest that instead of fuzzy suggested rankings, we do something useful with the tsumego and use the Elo system to properly judge how hard it’s going to be for someone with a certain rating.

Assign an Elo score to every problem, treat it as a player. If someone fails to solve it, it gains Elo just like a player would; if it’s solved, it loses points.

Whether it’s better to initialize a player’s “tsumego elo” to the overall/overall elo or otherwise, I don’t know. But I do know that the estimated difficulty of lots of problems is completely off the mark.

The obvious benefit? When I tell people to do “x% easy problems, y% medium and z% hard problems for their level”, they will actually be able to use a reliable metric.

Let me know what you think!

33 Likes

I love it.

How/where do we start?

1 Like

What would be ideal is a way to do tsumego where you can’t back out. So you think of the solution, then you have to play it the way you thought it, so that there’s no trial and error…

2 Likes

Well I’ve given that some thought and the best I could come up with was to make only the first attempt “rated” and all further attempts unrated, that way people would have an incentive to actually solve the problem instead of rushing to get the solution.

The obvious downside is that some people might be inclined to then solve the problem on a different board/editor and thus inflate their rating (and deflate the problem’s difficulty rating), but I am fairly confident that such “disruptive” behavior is going to be the exception, not the rule.

Perhaps with some additional effort, we could just hide the actual tsumego-elo (so people don’t go into brag-mode) and only show categories based on elo difference: anything from “like taking candy from a baby” to “impossible”.

7 Likes

If this gets implemented, does that happen for all tsumego problems automatically? Without some quality control, I imagine faulty tsumego would throw off the metric immensely, making impossible puzzles susceptible to be ranked high, as nobody solves them.

Or what about puzzles where not all the solutions are implemented?

And if we’re busy with this anyway, why not add a time limit to the whole thing? The faster you finish the problem, the better the player / easier the problem.

I would also use the system to rate only the problems, not the players. Although correlated, I don’t think being good / bad at tsumego is a direct metric for how well someone plays. If we rate the player, I suggest just to give it as an extra statistic, and not let it influence the rank of the player.

3 Likes

Great idea, and exactly how goproblems.com works if I remember correctly.

The obvious downside is that some people might be inclined to then solve the problem on a different board/editor and thus inflate their rating (and deflate the problem’s difficulty rating), but I am fairly confident that such “disruptive” behavior is going to be the exception, not the rule.

If they do it for all problems, it doesn’t matter. They will perform as a stronger player than they actually are, but from the systems view, he will be a e.g. 2 dan instead of 4kyu, and the elo of the problems will still be correct. If they only do it for some problems, then the rating for those problems will get artificially deflated, but I think that will be smoothed out in the long run.

5 Likes

Thanks for all your questions and comments!

I am fairly sure it is not that difficult to implement “flag problem for review” functioniality. Alternative solutions are fairly easy to check (Tsumego Pro also has that feature), even if actual impossibility is harder to detect (though it is trivial to see which problems are possible candidates).

Time limits would be useful, but more complicated to implement as it raises other questions and necessitates further design decisions. I’d be happy to see time limits incorporated, but it’s not necessary and perhaps even unwise to change everything in one fell swoop.

I never suggested to have the tsumego elo influence the player’s rank. What I did was to raise the question (as I do not know the answer) whether it would make sense to let the player’s rank inform their initial tsumego elo.

4 Likes

I use goproblems.com a lot (excellent site, unfortunately it seems to be semi-abandoned and the last post by the owner on his own forums was over 6 months ago). You could try writing an e-mail to the owner and asking about his rating algorithm. His name is Adam Miller and he has his e-mail on the site somewhere.

Good problem ratings are hard to do. Around my Kyu range, I feel that many ratings in goproblems are inaccurate, for instance, I will see “easy” 12 Kyu problems and “hard” 18 Kyu ones. I also have no idea how he deals with ratings deflation, because there are users constantly starting at 30 Kyu (both new and returning users), so the problem ratings will steadily decrease over time if nothing is done about it.

Now that we’re comparing with goproblems, I think the biggest issue over there is not the ratings, but the fact that there are so many low-quality problems (usually incomplete solutions or sub-optimal play). There is a “flag for review” system, but it’s obviously not enough, because sometimes the problem author doesn’t care to go back and fix it, and sometimes he doesn’t know how to fix it (some problems are submitted by low-rank players).

I think someone would do a good service to the Go community if he just downloaded the entire goproblems database of over 11.000 problems, selected the 2.000 or so best ones, removing all bad problems, and simply reuploaded them somewhere else (copyright issues notwithstanding).

4 Likes

I totally get your point, but you are probably also aware that curating any kind of collection is a huge effort, right? We’re talking weeks upon weeks of reviewing that would be necessary. Unpaid, largely thankless effort.

I agree that it is (already) necessary to curate the tsumego collection, but the reality of it is that we will (~would) need many strong volunteers who will do the editing, as well as many users who will have to do the submitting/flagging. Moreover, this is not a problem specific to making problems rated; it mostly concerns the method of estimating their dfficulty for any given user. All it does is to improve the current state of affairs. Without a doubt, there are many more refinements to be employed (and I welcome all suggestions!).

I think the problem with goproblems.com you’re citing here stems from the mixup between rating and rank projection. Obviously some people, especially ddk, will have vastly different tsumego profiles, but this should all even out if we forget about the need to make it seem more accurate (x kyu/dan) than the precision of our instrument (rating) allows for. That’s why I think the only useful distinctions are something like “impossible” - “very hard” - “hard” - “challenging” - “easy” - “very easy” - “effortless”.

1 Like

Unfortunately if you don’t take care of quality of problems, ‘very hard’ challenge might be barely distinguishable from a poorly set-up one, i.e. no one will get it right at their first attempt.

I do like the idea, but to make it work you would need to think about some curation of problems database.

1 Like

Just chiming in to say I would visit OGS far more often if it had a Lichess style tactics trainer, which seems like the Gold standard version of what you are discussing.

6 Likes

I’m pretty sure OGS is trying to become the lichess of everything go, at least visually it feels very similar to lichess and the migration to glicko as well haha
Long way to go but good path I believe

6 Likes

yes, someone already did it. visit [redacted] → there are goproblems compilation selected from best tsumego

4 Likes

I just had an idea… maybe @GreenAsJade can tell us if this is already possible.

For the time being, this should be the easiest fix. Disable move shadowing for puzzles. How 'bout that?

That’s great, thanks!

1 Like

I believe lichess.org uses an a.i. to harvest tactical positions from human v. human games. Is such a thing feasible here as an alternative to human curation of the collection?

Finding tsumego by alternative means is interesting in its own right, but less of a problem than halfway accurately assessing how difficult they are to solve.

We have thousands of problems at our fingertips, but very little information “about” them. Measuring difficulty in 1-5 stars when there are 30+ semi-distinct levels of play is quite vague (also ignoring the fact that judging tsumego at a glance is necessarily biased) and having a numerical value that’s even a little more accurate would allow us to do many more nifty things. Moreover, I think weiqi101 even has them sorted by themes (so is Cho’s collection but it’s implicit).

1 Like

Would current a.i. programs be capable of assigning some values to the puzzles + positions as they are harvested and processed?

I’m not trying to be snide. I’m genuinely ignorant of how these programs are written and implemented. But it seems to me that sorting by theme and difficulty should be (relatively) straightforward for a program that’s already capable of identifying tactical decisions in human v. human games.

Also, wouldn’t it be possible to assign each puzzle an ELO rating, and a separate puzzle ranking for each player? That way, the puzzle ratings will sort themselves somewhat automatically. If a puzzle “loses” and is solved, then it’s ranking will decrease proportionate to the puzzle rank of the player who solved it. If a puzzle “wins” then its rank is bumped up.

?

I found a bit of info on how puzzles are generated on lichess.org, which I consider to be a kindred spirit of OGS.

2 Likes

Um… from my OP:

I suggest that instead of fuzzy suggested rankings, we do something useful with the tsumego and use the Elo system to properly judge how hard it’s going to be for someone with a certain rating.
Assign an Elo score to every problem, treat it as a player. If someone fails to solve it, it gains Elo just like a player would; if it’s solved, it loses points.

Concerning “AI” judgements… there’s nothing “intelligent” about algorithms as far as I can see. We would still need a set of measures to use as parameters for estimating how difficult any given problem is, and the result would be a hierarchy of problems resulting from those ‘objective measurements’. I don’t know for sure, of course (as I haven’t had the chance to compare), but think it would be better to use the simple Elo metric as it natively tells us something about the relation between problem and solver - in form of the probability of someone solving the problem, based on prior experience with similarly performing solvers.

As for the generation of tsumego, there is the little known program GoTools, but it’s kinda hard to use, at least for me. I can’t even get it to solve simple problems… haven’t checked problem generation. I suppose the only halfway useful thing one could do with automatic generators is to say “make a problem with n steps to solve”, though I again don’t know how long that would take (probably longer than for chess, since Go’s branching factor is much, much larger and a definitive categorization requires exhaustive search).

2 Likes