A small experiment on OGS new player ranking

JethOrensin · February 19, 2023, 4:49pm

I have seen various threads and people complaining about OGS ranking problems for newbies (especially @Uberdude 's compelling arguments on that issue ) and a lot of people posting a vast array of numbers on the issue and, I will be honest, I never really much cared about the whole thing since it is an executive decision that I couldn’t really go back and test and have an opinion on.

However, after seeing that Baduk miracle exam I kind of wondered if I could have passed that exams and that maybe it was high-time to revive an alt account that I had made for fast/blitz games in order to see if I can actually play that fast live and train myself.

And while at it I thought I might finally check all those things about the newbie ranks and if it is a real problem.

By my estimation when no given ample time to think I am barely 10k and to keep it even fairer all the games were played past midnight or, in the case of the last one, when I was half-asleep. This should simulate a new high-SDK player well enough.

The experiment runs like this, I would be playing against people of the level I was thinking that I am really at in 5m and 5x10s byoyomi. Which is 12k to 7k.

To my surprise I was doing “ok” and after the first games I had won against one of each players in that rank range and my rank was inflated to 2.4k, just by winning against opponents that were nowhere near that or the initial 6k ranking.

This means that at that moment that account had a better ranking than my actual account, which makes no sense.

Bear in mind that this is a result than actual new high-SDK player account could have very reasonably accomplished, but at that moment I was getting a lot of cancellations and people were really afraid of the [?] and the inflated rank. A new player might have felt that this is not a good environment to play since other players would be actively avoiding and cancelling the games.

After that, I unfortunately lost on time since the opponent tried a ko and I didn’t notice that I had no other byoyomi and I couldn’t test how far up the ranks I could have gone with victories like that before I lost the [?] and at the last game I missed an atari for being asleep at the helm and playing my own moves. Thus the experiment ended at 6k instead of around 3-4k, which is still absurd as a new player experience that only won against high-sdk and ddk players.

I think that some mock test should be run to see the actual results on cases like that (which are not really edge cases at all) and re-evaluate the “new player experience” because I really got a lot of cancellations after the first games. Something seems to be wrong in this system at this moment and it probably needs some tweaking.

P.S.
I could not find which might have been the appropriate topic to put this. If a moderator wants to append this to a different topic in which it would serve better, please do so and sorry for any extra bother.

shinuito · February 19, 2023, 5:01pm

You can look at your actual rating history also if interested

ended	game_id	played_black	rating	deviation	volatility	opponent_id	opponent_rating	opponent_deviation	outcome	extra	result
1676820608	51227645	1	1488.41	159.77	0.060009	1079861	1217.08	66.24	0	null	13.5 points
1676738159	51199289	1	1618.65	165.27	0.060001	931557	1212.33	62.66	1	null	10.5 points
1676736745	51198689	1	1603.37	171.77	0.060002	1255125	1362.87	63.65	0	null	Timeout
1676677915	51178817	1	1753.30	179.19	0.059994	1168552	1415.39	60.95	1	null	Timeout
1676676032	51177990	0	1726.64	191.67	0.059995	523083	1157.08	61.78	1	null	47.5 points
1676673912	51177506	0	1718.27	195.89	0.059996	694184	1215.65	64.13	1	null	Timeout
1676591939	51148623	0	1705.42	202.66	0.059996	427128	1448.70	61.85	1	null	0.5 points
1676590412	51148128	1	1648.79	232.52	0.059998	24089	1335.91	61.74	1	null	26.5 points
1553283755	17107916	0	1590.09	271.54	0.059999	593528	1272.10	62.46	1	null	Resignation
1553197355	0	0	1500.00	350.00	0.060000	0	0.00	0.00	-1	{special:initial rating}

We can look at this two ways though. If your account didn’t rank up fast, we’d have people complaining that new players that are actually Dan ranked or kyu ranked couldn’t reach 2 kyu or 1 Dan fast enough and have to keep beating weaker players.

If anything if you really didn’t know your rank, then you would’ve just played against a 2kyu or 6 kyu to see if you could beat them, lose to them and have your rank decrease rather than airbag for a while

Similarly people complain that beginners can’t drop in rating fast enough, but I know a new player who’s an absolute beginner that after losing a single game to a 12kyu, was able to play ranked games against 22kyu players for instance the very next game.

It seems like people might find something to complain about no matter what

JethOrensin · February 19, 2023, 5:13pm

I think that someone would hardly complain about being worth a dan rating for winning against 12k-7k players. I honestly hope that I had managed to gather straight wins and see where I would have landed, but, unfortunately, I failed in that.

As I said earlier the problem is with actual new players of around the 9k-6k rank. Just like me who run that test, those wouldn’t challenge 2k players and they might actually win a few games in a row against their 9k-6k peers. It makes no sense to over-inflate their initial rank and land them in the “oh my God, look at that rank, I am cancelling this game” zone.
That’s hardly welcoming or good for either category of players since low ranked players will just learn to be wary and avod [?] ranked opponents.

Similarly people complain that beginners can’t drop in rating fast enough,

I didn’t test that, so I have no opinion on this

This topic is not a complaint, but just the result at a small experiment and thank you for giving me the chance to make that clear. If there is an executive decision that “this is fine” then that is not really my call nor my problem. I am not a new player after all. I am just recording what happened for anyone that might be interested.

shinuito · February 19, 2023, 5:17pm

I’m not sure how exactly this reads, and not to say that you’re complaining - it’s interesting to see different examples of experiences and so on.

Just what I mean is that whatever the starting rating, one could of course only play players below that point and never lose points beyond that initial rating.

I’m guessing glicko is maybe overcompensating a bit, and I understand the point that like 2kyu when only beating 8kyu and below seems a bit much, but I think “real” tests should probably compare people that try to achieve a rank, as opposed to look for possible edge cases of the system.

As in if the system works reasonably well but some user can reach X rank in a weird way because they’re not trying to achieve an accurate rank - does that mean that the whole system is bad?

qnpnpmqppnp · February 19, 2023, 5:20pm

I agree there are issues with new player rankings on OGS, but when saying that I rather have in mind the beginners who arrive on OGS, try to play against a noob bot and get told that they can’t because the rank difference is too high.

What you describe however doesn’t seem too problematic. Yeah your rating is off but it’s always difficult to adjust a ranking based on a few games, and especially if (like you did) they don’t rely on automatch but insist on playing only within a specific range. I believe you’d have had a more accurate rank if you had played whatever opponents automatch gave you.

In any case I don’t think newcomers who are already decent at Go will be too confused by that, though I may be wrong of course.

JethOrensin · February 19, 2023, 5:24pm

I honestly do not know the answer to that. Personally I find the DGS approach on this issue both optimal and transparent.
a) DGS if I remember correctly, allows you to set your own initial rank. If you lied, you will drop like a stone, just like here, but at least a new player that didn’t lie about their ranks (the vast majority of new players), enjoy an immediately balanced experience.
b) DGS has this awesome feature where you can tell EXACTLY how many points you would win or lose after playing a game against a player. I think that this is a feature that would be well worth investing in having in OGS.

It even calculates the rating changes with handicap or without handicap.

With:

Without:

DGS might be old, but this seems like an elegant and user-friendly solution.

jlt · February 19, 2023, 5:26pm

I don’t think it’s absurd to be ranked 6k if you have won 7 games out of 9 against players around 10k.

For comparison, on my alt account my first games (some time ago) were:

win against 13k
win against 6k
win against 3k
win against 3k
loss against 2k

and then I was ranked 2k.

JethOrensin · February 19, 2023, 5:39pm

If I had won 9 games out of 9 around 10k opponents I would have been ranked around 1k. Which is a rank that is much higher even from my actual account.

That is not a good potential result and it is a series of victories that an actual new player could achieve (though I do not want to create a new account and test this. I assume that the dev team can run mock tests fast and efficiently and test that out)

After winning two games against 3k. Hardly something that a person that is not above/around that rank could achieve.

On my self-rank valuation for live games around 10-9k, I was ranked 2k after winning 5 games against people of 12k to 7k.
In the span of five games, we got the same end result via a very different way.

jlt · February 19, 2023, 5:46pm

Being rated 2k after 5 wins against players around 10k and no losses is not absurd either.
I don’t know about the OGS rating system, but the EGF rating system expects a 6k to lose 1 game out of 4 against 10 kyus. So a rating of 6k in your case would have been too low. Anyway after 5 games your rating still has a large uncertainty.

Allerleirauh · February 19, 2023, 5:51pm

When I was sandbagging 2k+ players no one cancelled on me. Though it seems only those who were ready to play stronger player joined in the first place.

shinuito · February 19, 2023, 5:55pm

That should be pretty easy to incorporate actually I imagine. I guess the next obvious question would be where do we put that? Just a separate page under tools? I guess that’d make sense

Maybe if we get that calculator up and running we could answer the question of “What rank should you be to have an almost 0% chance of losing to an 8kyu 10kyu etc say - it probably should be a good few ranks higher I imagine, like maybe 3-4 at least ?

shinuito · February 19, 2023, 6:49pm

Step one:

find out how to add a new page and link it to the nav bar

Step 2 (tbl): figure out how to import the ratings calculation functions, and make some buttons to call them

JethOrensin · February 19, 2023, 8:17pm

It doesn’t sound very logical though. It might be correct mathematically though, so…

… in this case, we could say that the experiment is favorable to the system working properly? I have no problem with that

That would be good, though another suggestion would be to put it on the hovering window if you click on a player:
Screenshot_2

Under the 4.1k
it could say how much your rank would change if you play against them and win a dash and then say how much your rank would go if you lost.
E.g. in my case it could say 2.8k (win) - 3.2k (loss) (numbers imaginary)

That way there will be no clutter with extra interface and the information would be immediately and easily located by someone that might want to check various players (e.g. in a ladder) in order to decide whom to challenge.

jlt · February 20, 2023, 9:03am

Anyway if your game history has 5 wins and 0 loss, the system can only estimate a lower bound for your rank. You need both wins and losses to get a reliable estimate. So even if 2k was a bit overestimated, this is not a bad thing because this increases your probability to be paired against stronger players (assuming you use automatch) and thus to lose your next game.

Animiral · February 20, 2023, 9:04am

Imagine an AI-based ranking system, where instead of just wins and losses, every move in every game that you play is a data point. It could make for the kind of accurate ranking system that you expect.

Groin · February 20, 2023, 9:14am

Interesting, will we get that difference between good players (according to AI ofc) and good winners?

shinuito · February 20, 2023, 9:15am

I’m not quite sure how well this would work in practice though, because games aren’t just about the moves themselves but the reasons behind the moves.

If a much stronger player plays a much weaker player (ranked games happen in ladders), knows they’re already ahead by 30 points and starts playing slack - what happens to these data points? Do they start lowering the players rank because they’re not playing optimally to crush the opponent by as large a margin as possible

That said, if we treat the AI as a magic box that knows when to discount certain moves and maybe it only estimates someone’s strength based on their best and worst moves in a game somehow then maybe

However I can imagine it causing much more of a burden on the players mentally if their every move and mistake could impact their rank and not just whether they won or lost.

Groin · February 20, 2023, 9:25am

I don’t think replacing one by the other is a good idea. Whatever how you play good weiqi, you have to win at first.
Now to have a quality rating, aside the win rating seems a awesome feature. I’m no programer but it seems that most of the job is already done somewhere (as they analysed pro games vs AI) so it’s mostly just build the rating and UI from these.

system · May 22, 2023, 5:25pm

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.