I’ve noticed that some bots have been seriously de-ranked. A former Dan level bot is now 7k because too many people are beating it with 9 stones dozens and dozens of times. I think they should have static ranks so the same few people playing it hundreds of times do not give it a horribly inaccurate rank. I think there also might be a rank issue if the same users play each other over and over with the same results.
If I play a Dan bot and give myself 9 stones and beat it 100 times will I get to 5D?
I don’t think static ranks for bots is the correct solution to the problem you describe, rather that multiple games by the same user against them should not be treated as independent and equal to the rating system, but have an increasingly diminishing rating effect (for both parties) .
E.g say n is the number of games between user and bot in last 30 days.
My current thought on this is that bot games shouldn’t affect your “real” rank. My current sketch of an idea is to either just make all bot games unranked, or have a separate “bot rating” that doesn’t interfere with your normal ratings. In either case we might still need some way of rating a bot so we can automatically sort them and come up with some sort of strength of the bot, but that probably looks a bit different than simply rating the games for the bot as we do now, probably aggressively filtering out suspicious or abnormal games and turning down the deviation. Bot strengths can change over time as operators sometimes make updates to the engine or networks, and bot operators do like to see how strong their bot is against players, so I think there’s some value to still processing ratings in some way for bots, but I do think the system needs some work.
Downside of that is you lose the “If you are a new account that humans don’t want to play, then play some bots (who aren’t so picky) to get a solid rank first”.
How about having that low rate of rating change all the time for bot games?
Let’s say 10% of a human vs human game.
The results would be:
Normal users around the bot rank cannot gain much from repeat games. The most enthousiastic people play 50 games in a row, just for reference.
For the people in the 40-69k range (quite a lot), a first win against a 25k bot doesn’t insanely change their rank, but enough to stay motivated to play more and get into 25k territory.
Bots stay useful, but incentive remains to make the step to play humans.
Bots will still accurate reflect their rank compared to humans, because of the sheer numbers they play. Their rank will stay closer to the mean.
I agree that high handicap games shouldn’t be ranked as well, because that quickly and easily stops the abuse-bots-to-become-11dan-issue. That is a specific niche of the question of if or how bots should be ranked
That’s true, we could also do something like adjusting your rating when your rating hasn’t been established yet, but then once your rating has been established we don’t count them anymore.
I think handicap games outside of the usual limits for ranked should be anyway, like the ones where we saw 9 stones on 9x9.
Also having the bots do free placement handicap (because it’s Chinese rules) and then place them badly/randomly kind of reduces the effect of the handicap.
One could force the bots to use prescribed handicap positions in ranked at least.
But yeah if none of that really helps then just generally ruling out high handicap makes sense.
Another downside is that bots wouldn’t get ranked.
Please consider the earlier suggestion of weakening the effect of repeated games against the same player! When one player plays the bot 100 times in a row, it’s crazy that we weight that as much as 100 games against different players. This could apply to everyone as a normal feature of the ranking system.
The gradual decrease in ranks over the past year or so might be a different issue though.
I don’t have a strong opinion about that myself, but this is how it is done on Board Game Arena and many people there hate it, there are entire threads about why this is terrible/stupid/harmful etc. Not saying that people saying that are necessarily right, but a) if I recall correctly I think they have at least a few valid point and b) even if they are wrong, the it is still very likely many players here would feel the same. (Although maybe it would be less problematic here. We don’t have the ‘there are only 3 strong players in game X’-problem on this site after all.)
could you share some of the points people raise at Board Game Arena? Reading this thread, I’m thinking the main issue with static bot ranks would be tying them to the existing rank system
But obviously that is something that can be addressed (and possibly hasn’t been addressed over at Board Game Arena?)
If the bots are losing because they can’t handle high handicaps, then the admins or OGS can bar them from accepting such games, as already suggested.
If the bots are losing because they are not as strong as advertised, then they should be deranked.
If the bots are losing because the humans have figured out a way to beat them repeatedly, then that is, morally speaking if not literally, an exploit. Once upon a time, the mods annulled illegitimate wins resulting from an exploit and warned the abusers. For example, this was done frequently in the days when bots couldn’t recognize ladders.
This might be a bit technical, but I was thinking a rating system like OGS, is actually an adaptive machine learning system, which has parameters that learn to predict the winrate between players/bots in the future, based on input of the historical game results. So if we don’t update parameters with the games involving bots, that means they effectively become a validation dataset, instead of part of the training set, and future games will be the test dataset, what would happen if we look back at historical data and set different cutoff points at different dates, and exclude bots games as evaluation dataset only, what would be the current ratings purely based on players’ only games? And if we split differently using K-fold, mixed-in or exclude different bots, what would be their impact on the ratings? This might be a way to evaluate the true impact of how we should process for the bots ranking adjustment. Based on their impact and decide if we could completely unranked them (if they don’t impact the prediction of ranks), or with a weighted value if they do contribute to a certain degree (like they deviate certain percentage, then we can set a value of impact accordingly).
I think the main issue is that in then current ranking system there is no or hardly no account to the rank difference of the players. I can play players ranked 5 stones below me without handicap and I get a score lift of 0.1 if I win. In fakt it is very likely that I crush the opponent. I think the rank win/loose amount should be adjusted to the currrent rank difference of the players and the handicaps given (or not given).
But thats exactly how the current ranking system accounts for the rank difference!
When you win against someone who is weaker than you, you gain only tiny bit of rating points. But if you manage to win against someone who is considerably stronger than you, you gain a ton of rating points ^^
And of course, vice versa for losses. Lose against a stronger player, and your rating will drop only a marginal amount. Lose against someone ranked lower than you, and your rating can take quite a big hit >__>
This is probably the best idea for satisfying everyone – the only tradeoff is a bit more complexity in implementing the idea compared to the others.
Here’s my solution that’s even more difficult to implement but could have some extra benefits:
Introduce a classification of “ranked bots” with either static ranks or limit/cap the volatility of bot ranks. Players seem to have a cap of ±0.7 on their ranks. Ranked bots could be automatically set to ±0.3 and handicap games must be unranked. If that’s too restrictive, a high dan moderator could oversee and audit static ranked bots before they’re allowed to accept ranked handicap games.
For bot owners who want specifically to test the strength of their bots against the OGS rated player pool, they could be given a separate classification of “test bots” which don’t affect the rank of human players and only affect the bots’ rank, with normal volatility.