For rating systems in chess (so no handi) it makes sense to use binomial deviance * that has a lot of other names. I’m not entirely sure how you would use it for handicap.
Well, to me it seems fairly standard from the point of view of “Well, this is roughly how far it spreads from the expected score on average”
The problem with “correctly labelled / all samples” is that it doesn’t take into account that the system isn’t trying to get correctly labelled samples, it’s trying to move the player to find an estimated rank that’s as close to that player’s true rank as possible. Because of this, a major upset (such as a 13k beating a 1d) should prove a bigger source of …
*It might be worthwhile to look at the solution http://www.chessmetrics.com/KaggleComp/1-TimSalimans.pdf Post-processing part is quite brilliant. Talk about Goodhart’s Law.
3 Likes