By “announced level” I mean the level which is displayed when selecting a bot in the list, and by “in game level” I mean the level which appears in game, and which (importantly) determines their handicap.
I’ve seen a bot announced 19k and then in game 25k
If you want explainations, we need name of bot, what level announced where and when, his real level where and when. So we can see what is the problem if any
Since I clearly didn’t know either, I’ll speculate…
Given: The graph indicates that when it wins its reported rank increases and when it loses its reported rank declines - i.e., same as with every other player in the system.
Therefore: Unless the bot truly learns and improves based on its gaming experience, the change in reported ranking is an artifact of our ranking system - not an indication of its improvement (or deterioration).
So the assumption would be that humans can improve (or deteriorate) but these particular bots do not. (They are static models once they are released.)
There are certainly forms of artificial intelligence that do learn from experience. But unless these particular AI models are dynamic in that sense, their inherent ability remains fixed at some particular level, regardless of what the screen says.
Right, but one might expect that if a bot had a “level of skill” that could be represented by a rank, then it would win and loose based on that in correct relativity to its opponents and their rank, and so the rank system would produce a relatively stable output.
That’s not what we see - and it is what we see for humans, even those who play really tons of games (as bots do). So it’s not just “the number of games” that produces these wild swings.
Which tells me:
Bots are not good as an anchor reference for a rank
Labelling/announcing bots as being “a rank” is not the right thing to do.
This appears to follow logically, but leaves the corollary challenge: some indication of strength is still needed for practical purposes (i.e., model selection choice) - hence something like your ”weak” vs ”strong” labeling idea.
Unless these particular bots are dynamic LLM AI models, the constant changes in their reported strength remains an artifact of the ranking algorithm, not a genuine change in playing capability. Unlike learning humans, they are static over time. So ”weak” will remain weak -regardless of how many victories it achieves. And the same goes for ”strong” despite a losing streak.
Since all the responses to this line of thought were moved to another thread, I’ll just say I disagree, and any continuations can go in that thread.
Honestly the whole anchor discussion is probably even off topic here ( and a lot of people don’t like the KGS anchor system, because it leads to huge fluctuations for humans, who haven’t even played a game in a month or so).
I think it’s something that if it’s a quick fix it’s really worth fixing and thanks to @Bibu54123 for bringing it up.
There’s a number of issues with weak bots that beginners might play, from them being wrongly ranked, to their rank in game not matching the rank when you challenged, to the fact that some of the weak bots are not actually weak in the way you expect and don’t really make good opponents for new players.
(I mean - my intention was that the whole lot went with the other thread).
If someone’s keen, I imagine it could even be done in the front end, quickly.
Just do a map of rank-range → description, and each bots current rank to the descrption in the display there.
With this quick-fix, some bots on the edge of the ranges might fluctuate, but if you make enough grades (“Really dumb”, “beginner”, … … … “advanced”) then this would be minor.
That might be a little hacky, but a backend fix might be a lot slower to materialize…
Do you mean some of the bots currently “announced” as weak are “not weak in the way you expect”, or do you mean that there are bots who’s current rank appears weak but it’s not what you think? The latter has no solution other than human observation and categorisation, I think?
I assume that ”weakness” results from either (1) sharply limiting a bot’s time/depth before forcing it to make a move, and/or (2) introducing an occasional random forced error to mimic a human blunder.
At least this is how I dummy down bots when I use well known open source models on BadukAI. My playing level still has little chance for a victory… But it’s slightly less humiliating
In short, I undermine a model’s ”pondering” function by customizing specified metrics.
Translating a set of such impairments into common language equivalents (dumb, dumber, dumbest) would then remain a matter not of ranking kyu/dan assignment but of language arts.
I don’t think it’s about whether the bots fit in to the dumb or beginner category.
Look at @Feijoa’s screenshots, amybot will display 19k on the play page and 23k on their profile page.
There’s a chance it’s pulling the information from different places, I think the available bots list is using a socket connection, and the profile pages are probably using the termination API or something similar.
On the other hand though, there’s also a chance that by the time you look at one list and check another page that the bots ranks have actually changed also.
Bouvardia was in the list as 18k, as it was in @Feijoa’s screenshot, then when I opened the profile page it was 17kyu and then when I refreshed the profile page 30s later, it was 19k, actually after just one game
If the issue is just more of less a “visual issue”, couldn’t we just keep the ratings as it is, but switch the bot’s “rankings” to something like “yearly average” or linear regression average. It should be a fairly easy coding and calling one function, to show this regression to the average ranking (like for years it fluctuated, but they average out, and we will still have a rough idea how “strong” it relatively is with the long term average, if players really want to know the actual ratings, just switch to ratings)