Score-based rating algorithm? (estimate score difference between 2 players)

Chinitsu · May 14, 2021, 10:07am

So I’m acquainted with Elo, but Elo is about winrate between 2 players (not taking actual point result in account).

But does anyone know of an algorithm that use point difference and can give estimated point difference when 2 players face each other?

(Yes I know about timeout and resign situation, let’s just say we put a maximum cap of 100 points for those kind of games, I’m just wondering if there’s such an algorithm that I can twitch to my own situation)

teapoweredrobot · May 14, 2021, 10:36am

Sounds like a job for reverse Komi! Maybe @Samraku has thought about this.

Chinitsu · May 14, 2021, 10:37am

Exactly. I’m a host of custom-komi inhouse tournament these days

teapoweredrobot · May 14, 2021, 10:38am

Also

Chinitsu · May 14, 2021, 10:54am

Very interesting. So they estimating 1 ranks equal about 12 points?

Samraku · May 14, 2021, 1:15pm

Appropriate reverse komi and expected score difference are related but not precisely the same. If both players play so as to maximize their score, then the expected score difference and correct reverse komi should be the same, but in an even game the stronger player will probably not try to win by as much as possible, particularly once they are already confident in victory, and may even try to make the gap as small as possible.

Related thread from when I was brainstorming how Lagrange Points would work:

EDIT: corrected Android having thought that “try” was “with”.

Chinitsu · May 15, 2021, 6:35am

So since the direction of the thread sounds like “anyone’s guess is as good as ours”, yesterday we tested out a system like this:

Each player start with a starting rating. Basically 12 points apart between each rank, based on Senseis’ article @teapoweredrobot posted.
For each game played between 2 players, each player’s rating would change by 0.2*Difference from expected value.

So for example

We have a game between 120 rating player vs another with 60 rating (expect that 120 player will win by 60 points)
Result is 120 player won by 80 points, 20 points above expected value
His new rating is now 120+20*0.2 = 124

So very rough system. It doesn’t take into account possible player growth (like usually nearer games will have higher weight right?).

What do everyone thinks? Is there any big flaw with this one?

shinuito · May 15, 2021, 7:14am

Do you just estimate how much a player was losing by to update their rating if they resign?

Or do you encourage people to play to the end just to get a score?

Chinitsu · May 15, 2021, 7:18am

So it’s a team-based point-scoring tournament. Each point of territory you scored get added to your team’s score.

Resign/Timeout/100 plus wins are capped at 100.5 (this is Korean’s manbang gambling rule I think?)

Currently we consider all of those games 100.5 in the calculations as well.

I’m still reading the Lagrange thread for research, the games are not exactly the same (games there still only care about win/lose while our game do care about actual point results) but I think I can copy some theory

Samraku · May 15, 2021, 11:09am

While I would initially dismiss this as it relies on the score difference instead of who won or lost, it really seems to elegantly solve a lot of problems I’ve been having with my implementation of the Points Rating System. This could be a case of the benefits outweighing the harms.

I love that this system provides for games with no handicap or incorrect handicap to be dealt with. Upon further reading, is this intended to be used with reverse komi and/or handicap stones at all?

How do you deal with the degenerate case where the stronger player knows they’re going to lose (after komi), but might still be able to come back if they can start a fighting somewhere? If only W/L matters, the stronger player will start the fighting and attempt to turn the game around, but with the above system, where the Difference in score directly impacting the change in rating, there may be a perverse incentive for the stronger player to just submit to the loss and work to make it as small as possible.

How do you deal with the other possible issue where one player is confident in victory (after komi)? If only W/L matters, that player may legitimately wish to favor safer moves, especially if they are the weaker player or they are very far ahead, but in the above system, both players will still be trying to maximize score? This is not so big an issue as the other mentioned case, and may even be desired behavior.

Another issue is what if one player rated 100 plays an even game with one rated 210? If the 210-player wins by 110 points, do they lose 9.5 * 0.2 == 1.9 ~= 2.0 points (because the score of the game for rating purposes cannot exceed 100.5 and the expected score difference is 110)? Or do you just disallow such matchups, perhaps requiring them to be played with some amount of handicap and/or reverse komi to bring the expected score difference below 100?

EDIT: another potential flaw with basing the system on score difference, being forced to play a losing game to the end, especially at high reverse komi. I have given very high (300+) reverse komi to TPK players, and if I don’t unsettle every one of their corners, the game becomes almost immediately resignable. If they were calm, protected their corner stones, they’d have at least a dozen points per corner (area scoring), so almost 50 points, which is more than enough to win a 300 komi game. At that point, I’d resign. But if the gap matters, then I would have to keep playing an utterly lost game to try and minimize the damage. (Note that it’s worse than just not killing the corners, I will be attaching to and “armpit hitting” their corner stones and assuming that anything which escapes to the center can just be killed later, so if it lives, there is no compensation whatsoever on the outside.)

teapoweredrobot · May 15, 2021, 11:34am

I have no idea what this is and am reluctant to Google it…

Chinitsu · May 15, 2021, 1:19pm

So as I mentioned before, the reason-of-existence for this system is to serve as a “player cost” of a player in our inhouse tournament. The rule of the tournament is like this:

Game type: team-based point-scoring matches between 2 teams. Teams score point equal to the number of (territory) point won in each of their player’s game summed up.
Drafting rule: 2 team captains pick player into their team (including themselves as first pick, then usually in 1-2-2-2-1 order), for each player a captain pick, the team will pay for that player’s cost (the rating value of this system). After picking, the team that used more player cost during drafting will need to give the other team handicap equal to difference in point.
Matching Rule: captain secretly assign players into 1->5 slots, then send it to a bot. After both captain has sent their lineup, the bot show pairing publicly.
Game: As mentioned before, the game is capped at 100.5 point victory cap. The number 100.5 was just based on Korean’s Ban Neki gambling rule (I mistaken the name. Manbang in this rule mean 100+ points win, not the name of the rule itself @teapoweredrobot). If a player win by 20 points, they score 20 points for their team.

After summing up the score of all 5 matches, team with higher points win. Each player then got their “player cost” adjusted based on their performance in their match (differenceFromExpectedResult*0.2)

You’re right. Currently the system is designed specifically for no handi games. Player just look at the difference in player cost between themselves and opponent to know how much they have to win/lost by to not become “dead weight” for their team.

I think it’s a fair strategy and could become a playstyle of a player (coughtheonewritingthisreplycough). By statistic though, I think risky playstyle versus consistent loss playstyle will probably lead into similar rating.

I want to note though that with this team rule, winning more points will score more valuable point for their team. So probably we’ll see more risky playstyle than consistency playstyle.

This is exactly the behavior we hope for. We want to lead the games toward exciting/more gambling like experience for both the players and spectators

The desire to gain even more point will probably lead to losing mistake, and that’s where the fun come from.

Samraku:

Another issue is what if one player rated 100 plays an even game with one rated 210? If the 210-player wins by 110 points, do they lose 9.5 * 0.2 == 1.9 ~= 2.0 points (because the score of the game for rating purposes cannot exceed 100.5 and the expected score difference is 110)? Or do you just disallow such matchups, perhaps requiring them to be played with some amount of handicap and/or reverse komi to bring the expected score difference below 100?

EDIT: another potential flaw with basing the system on score difference, being forced to play a losing game to the end, especially at high reverse komi. I have given very high (300+) reverse komi to TPK players, and if I don’t unsettle every one of their corners, the game becomes almost immediately resignable. If they were calm, protected their corner stones, they’d have at least a dozen points per corner (area scoring), so almost 50 points, which is more than enough to win a 300 komi game. At that point, I’d resign. But if the gap matters, then I would have to keep playing an utterly lost game to try and minimize the damage. (Note that it’s worse than just not killing the corners, I will be attaching to and “armpit hitting” their corner stones and assuming that anything which escapes to the center can just be killed later, so if it lives, there is no compensation whatsoever on the outside.)

Right now we don’t have 100+ rank difference yet (maximum is 60 right now), but probably we will disallow 80+ difference matchups. I think it’s similar to how you can’t play ranked game with 6+ handi on OGS. So this system also have limit on how much of a rating difference it can give like the other system.

(And another thing is even game with someone 9 rank below yours is a very boring experience for the player, which is against our aim)

Samraku · May 15, 2021, 2:20pm

Sounds like you’ve found a very good principle for your use case, then.