Make Tsumego Great Again!

Atorrante · October 7, 2020, 10:51am

Had to google it. But still not sure what you mean.

Samraku · October 7, 2020, 10:54am

As far as I know they’ve already figured out how to do this well for chess tactics, so their solutions would be desirable.

shinuito · October 7, 2020, 1:37pm

I think a categorisation which just compares the puzzles relatively to each other like this

is better than trying to tie the puzzles to a particular rank/rating

There’s many aspects to the game of go other than just tsumego, and they’re all bundled up in a persons rank. I think it’s usually too confusing to label go problems by rank, since there is no one universal rank, and also not everyone does tsumego. It kind of has the implied “if you’re this rank you should be able to solve all the problems ranked lower than this”, but that probably won’t be the case, and so the puzzles difficulty just won’t line up with ranks. I imagine you just end up with comments like this

I think you might mention though, that most normal go players can and should subtract a healthy percentage from their rank to find the right problem books. I.e. as an 8k, I can barely solve the 12 to 14k Tsumego. I have seen 5Dan players struggle with the 3K problems…

Even if its granulated further to

what really makes a puzzle a dan level? Is it the number of variations to try out, the number of moves ahead you have to read, the number of distinct tesuji involved etc?

The only thing that really makes sense from a puzzled collection is whether one puzzle is ‘harder’ than another. I think even if people find a puzzle easy (possibly because they’ve done the same or similar before), they probably can judge if this would require more work to figure out the answer from scratch as compared to another puzzle.

I think this makes sense, whenever you have the data available.

shinuito · October 7, 2020, 1:38pm

Tbh, I forgot what this thread was about, and I did like some things, so I probably had read it before. I’ll write two posts though instead.

So if we’re back to the original idea of trying to clean up the OGS puzzle section, then sure, it would be nice to be able to give a (separate?) ‘average rating’ to the puzzle or puzzle which depended on the level of the players who’ve successfully solved or failed the puzzle. It would in part take away the random judgement that “a 12kyu should be able to solve this”, and replace with “typically 12kyu’s have solved this”.

I think when you allow people to vote there’s a lot of factors to consider and you need a good system in place. You either need something simple like just an up/down/novote button which could be used by people to modify the apparent difficulty of a problem. If they get a free vote or choice of rating, there’s too much potential for outliers and trolling.

I think this only makes sense when you test the solution properly. For instance, the OGS puzzles randomly select a response from the tree, from what I remember. If you only had one line, for example the strongest response to the players move, then if the player knows that line, they’ll solve it quickly.

How do you know though that they’ve really thought through all of the possibilities and responses though, to really have solved the problem? In theory you could be comparing data from people who click and solve and move on, against people who sit down and consider all the possibilities first and then answer.

bugcat · October 7, 2020, 1:40pm

the OGS puzzles random select a response from the tree

We should ideally have a feature which allows the puzzlecrafter to specify the strongest resistance and have that played every time possible.

shinuito · October 7, 2020, 1:49pm

That probably would be a nice feature. Maybe it could be a toggle depending on whether you just wanted to explore the puzzle and see what kind of responses were in the tree as opposed to just wanting to straight solve it against the toughest response.

It might be better as well to decide again the focus of the discussions.

Is the proposal

to change the current OGS puzzle section, where users can still make their own puzzle collections.

OR

to implement a new puzzle feature, say like the Joseki Explorer. This one maybe would be more similar to sites like lichess.

I think I have different feelings depending on the options above. Like for instance, maybe you can just go ahead with the idea of assigning a rank to the puzzle based off of the players who solve it. I still think, what counts as solving it is debatable. I mean you could have something like key-lines to each puzzle, if there’s more than one tricky response.

bugcat · October 7, 2020, 2:18pm

One thing to consider is that Lichess has a lot of puzzles, I think because they used a bot to farm them out of users’ games. OGS puzzles are all created by humans and we don’t have that many. So what works well on a large scale may have unforeseen effects on a smaller one.

ckersch · October 7, 2020, 3:51pm

Why are people discussing having the puzzle author assign difficulty? Give the puzzles a glicklo rating, give the players a glicko rating, and let everything sort itself out.

In terms of what good tsumego software should look like: 101weiqi, but in English.

shinuito · October 7, 2020, 4:47pm

Currently the puzzle author does assign difficulty.

Does that make sense? If I’m reading it right, it sounds like each user gets an ELO and whenever they try a puzzle they ‘match’ against the puzzle and the ‘loser’ gains ELO while the ‘winner’ gains ELO. Why does it make sense to treat puzzles as though they were players playing a game against a user?

I just imagine a large scale deflation of puzzles ELO depending on the setup. Or stranger results. Imagine I’m a 12kyu and I’m doing a set of 12kyu puzzles (supposedly equal difficulty). If I get a bunch of the first few questions right, those go down in rating and I go up. Then the rest of the 12kyu puzzles eventually don’t go down as much when I beat them because I’m higher rating.

At least this would be what I imagine in the current puzzles setup. The first few problems of a set probably get played more than the rest.

Would you suggest a new Puzzles feature similar to the joseki explorer, or would you suggest randomising the order of puzzles in a set that the user uploads to compensate the possible bias of new users (to the set) trying the initial puzzles a lot?

ckersch · October 7, 2020, 5:02pm

Adding Elo/Glicko ratings to puzzles would only cause deflation in puzzle ratings if the average puzzle is more difficult than the average player can solve. Even then, it wouldn’t ultimately matter since the ratings would be internally consistent. A 12k puzzle would be something that a 12k puzzle solver could solve 50% of the time, whether a 12k puzzle rating would correspond to a 12k Go rating wouldn’t be relevant.

I don’t think there’d be many strange results. In your example, for instance, you’re describing minor fluctuation of puzzle ratings after one person plays them. That wouldn’t have a significant impact on those ratings, though: so long as those 12k puzzles maintain around a 50% “victory” rate for 12k players, they’d be rated as 12k. Gaining 0.1k or something in rating won’t make a noticeable change for anyone else. If lots of 12ks are all consistently solving a puzzle, then the rating on the puzzle should drop. Short term randomness will be corrected over time, just like all ratings are. To the best of my knowledge, almost every tactics trainer for chess or joseki trainer for Go uses a rating system for handling puzzle ratings, and they work far better than the alternative.

shinuito · October 7, 2020, 5:31pm

It makes sense that the player gets a score, and gains loses points depending on a loss/win. I guess the thing I don’t know about these, is that while I’ve played some apps that give me a puzzle solving rating say, I don’t know if I actually actively change the puzzles rating at all (or if the puzzles have a rating). I also don’t know if I beat a puzzle does that have any impact on anyone elses puzzles score or rating.

dragon-devourer · October 9, 2020, 4:35pm

This is all getting over-complicated, so let’s simplify:

The current system of puzzle authors setting the difficulty as a rank from 30k to 9 dan is too fine-grained, and thus is often inaccurate and hence misleading. Therefore, this system needs to change.
There are many fancy things you could do with automatic ratings of puzzles, player voting, etc. but it will be hard to agree on the details of how to implement this, ther is the potential for unintended consequences, and it will be a lot of developer effort so prone to bugs and might just be too much work that it never gets done. Hence, this is maybe not the way to go.
The simplest, quickest and easiest thing to do is to just replace the list of 40 ranks in the difficulty drop down with a list of 5 items: elementary, easy, medium, hard, very hard. This might not be perfect but it’s better than the current system so it adds value immediately, it is quick and easy so might actually get done, and it’s simple so I think it should be easy to agree that this is a good change to make.

To facilitate the transition from rank difficulty to broad difficulty categories, existing puzzles will print need to have their difficulty converted, something like 20-30k = elementary, 10-20k = easy, 1-10k = medium, 1-5 Dan = hard, 5-9 Dan= very hard. The details of this are not too important as authors may wish to review the difficulty and it may be desirable to have some other system for changing the difficulty in the future, e.g. up/down vote. But for now, I suggest we do the minimum that will provide a definite improvement, i.e. basically, just change the drop down menu.

shinuito · October 9, 2020, 5:03pm

Just to add to this, in some instances it’s not that the rating is too fine grained. It can be that a puzzle set with ~190 puzzles eg, can have one rating (25kyu) for the whole set, even though there is the option to individually rate them. Or you could have the same set of puzzles from another user being labelled at an entirely different rating (12kyu). So user decided ratings are not necessarily accurate or consistent, and it’s possible that a user uploading many puzzles just doesn’t want to bother individually rating all of them.

This feels a bit too broad of a category though. You’d have to either ask current puzzle uploaders to update their rating to the new system, or you could just auto-convert them but that would also assume they were accurate to begin with but that’s hardly the case.

So I don’t think this works if we’re not already confident that puzzles sets are at a reasonable rating at the moment (in particular the popular ones).

I don’t know if it actually achieves anything though if we just relabel the problems without any effort to fix the problems.

Atorrante · October 9, 2020, 5:18pm

Maybe I missed something or am I following an incorrect line of reasoning, but if I go to the Puzzles section

I see that the rating is:

done on a collection-of-puzzles-level, but not always on an individual puzzle level.
are all (as far as I can see, at least the first three pages) 4.5 stars

If I open one of the collections of puzzles I see

where we can rate an individual puzzle (I put it on 1 star) on a five star scale (so there is already a possibility to make a five category scale like 20-30k = elementary, 10-20k = easy, 1-10k = medium, 1-5 Dan = hard, 5-9 Dan= very hard scale )

Is it your intention to rate all individual puzzle of all collections of puzzles?
There are about 1300 collections of puzzles and I don’t know how many individual puzzles. Would mean quite a task to do so
Don’t worry, I will help you.

And again … Maybe I missed something or am I following an incorrect line of reasoning,… so correct me if am wrong.

shinuito · October 9, 2020, 5:30pm

I think that could be a misunderstanding of the double use of the word ‘rating’. On the one hand we (I) sometimes use the word rating interchangeably to rank because the two are usually connected 1700~8kyu etc. The rating I would refer to in the case of the puzzle is really the rank the uploader assigns to it, in your image 25kyu. The way a puzzle set gets a ranking or ‘Difficulty’ seems to be it takes the easiest and hardest rankings of the puzzles and displays it as a range, so in that collection 25kyu-9kyu is displayed because it contains 25kyu puzzles (like the one you linked) and (a) 9 kyu puzzle(s) Play Go at online-go.com! | OGS.

The rating in the sense you are highlighting with the stars, is usually an indicator of quality or satisfaction a user has with the puzzle. You might rate it 1 star if it’s poorly explained, if the solution is incorrect, or there’s various other issues with it. I would wager that the collections overall rating is probably an average of all the puzzles ratings in the collection. So a 4/5 star puzzle collection probably has a high amount of highly rated puzzles, and hence they show up first on the list of OGS puzzles. That is these puzzles are good quality/well made, generally popular in some way etc.

Atorrante · October 9, 2020, 5:42pm

Makes sense, thanks for correcting me.
But still wondering what @stephen-biggs-fox is aiming at, so what the next step in this project is.

ckersch · October 9, 2020, 6:16pm

I don’t think having a 10k rating band in difficulty is conducive to actually studying effectively. If I’m an 8k, I want 10-12k puzzles, not 20k puzzles. If I’m a 15k, I don’t want 10k puzzles or 30k puzzles: I want ones in the 20k-17k range (roughly). Obfuscating inaccurate ratings by just making uselessly broad rating bands doesn’t improve things.

Ultimately, I don’t do tsumego on OGS at all because there’s far better platforms for it. Part of what makes those platforms better is accurate ratings. Autor-generated ratings will never be accurate, unless something is done to automate puzzle ratings (which, again, is done literally everywhere else), that deficiency won’t be meaningfully addressed.

Atorrante · October 9, 2020, 6:25pm

I am not very familiar with rest of the internet go scene.
Which platforms do you refer to?

From what you write I could draw the conclusion:

all else is not useful.
Correct?

Your mail is an eyeopener to me, thanks!

shinuito · October 9, 2020, 6:50pm

This seems fair enough for the moment. There’s lots of apps and websites etc.

So one could make a separate puzzle rating for each user, which again could be common on other sites (I’m imagining chess dot com or lichess). Then in theory whatever way the rating works (user or user & puzzle have a rating) then you could work on puzzles of your level, provided the puzzles achieve a stable rating or are set at a sensible rating (somone still has to make a call on that)

You could also go the other way. Don’t give users a puzzle rating but also don’t adjust their actual ranked game rating. However use that ranked game rating to set an average difficulty level of a puzzle. I’m imagining something like a histogram of pass/fail by rank for a puzzle and then assigning an average rating based on that somehow. The idea would be that the rating tells you that OGS players of this level typically solve or can’t solve this puzzle, and these would be the people you’d be playing rated games against.

One way for instance would be to assign a difficulty band to the puzzle, lets imagine its (8 kyu, 10 kyu). If you had a histogram of attempts at the puzzle, which showed the number of passes and fails against ranking of the players, then maybe 10kyu can mean something like x% (90%?) of attempts by players below 10 kyu are fails. Then the 8kyu can mean y% (90%?) of attempts by players above 8kyu are passes. Something like that could give you an of how the attempts at solving the puzzle play out by OGS players.

One can imagine if you had a very tight rating band, it could be indicative of some tesuji or idea in the puzzle that a large number players below a certain rating might not have seen, but a large number above a certain range have seen.

Or maybe it’s just the case you get a broad (too broad) rating range for it to be useful, I don’t know.

There’s still the issue though of what it means to solve the puzzle…

dragon-devourer · October 9, 2020, 11:13pm

TL;DR - Actually, don’t convert existing puzzles from using difficulty ranks to difficulty categories. Just add the categories to the top of the difficulty drop-down menu so puzzle authors can use categories or ranks as they wish. We should make this trivial change now, and aim to develop an automatic puzzle ranking system in the longer term.

Long version:

@Atorrante Yes, I am talking about the difficulty (rank), not the rating (quality out of 5 stars). Apologies for my ambiguous language.

My proposal of changing the difficulty drop-down menu from 40 ranks to 5 categories does fix the problems (at least partially). Currently, a puzzle creator has to select the difficulty from a list of 40 ranks. This is near impossible to do accurately. However, even a beginner can probably select the correct difficulty from a list of 5 categories.

And as a puzzle solver, I don’t care if a puzzle is, say, 12k vs 10k. I just want to try a selection of elementary to hard puzzles (maybe start with some elementary ones as a warm up, then some easy, a few medium and a hard one to finish - and skip the very hard for me!).

Also, if a puzzle currently has the wrong difficulty, then correcting it is easier with 5 categories than it is with 40 ranks.

I think it does work. If we are reasonably confident with a puzzle’s difficulty as a rank, then we can be reasonably confident when this is converted to a category - so no problem there. And if we are not confident in the difficulty as a rank then we might also be low confidence in the difficulty as a category, but then we end up no worse than we started (or maybe slightly better because, as above, fixing a category is probably easier than fixing a rank). But then again, an inaccurate rank might become an accurate difficulty category, which is a win.

Although, we don’t even need to convert. Just leave the existing puzzles with a rank difficulty and add the difficulty categories to the top of the drop-down menu so new puzzles can use those (or rank if the author really wants to). This then becomes an almost trivial change to implement.

There’s nothing wrong with doing some puzzles that are a bit on the easy side or a bit on the hard side. If there way too hard, then yeah, not so useful. But a bit on the hard side will push you further, and easy ones are good for practicing fast reading and making sure the fundamentals are solid. This is why I think broad categories are actually better - so you get a mix of puzzles in your target range. I doubt very much that a player who is, say, 10k would not benefit from puzzles accurately ranked anywhere from 15k to 5k, or even outside those limits.

@shinuito and @ckersch - I agree that some sort of automatic ranking system would better in the long term but the trivial change of adding difficulty categories to the top of the drop-down menu can be done now to improve the situation until a robust system is devised and implemented.