Maybe I’m incredibly dense, but what did flovo do differently between the red line and the green line? Does it have higher or lower predictive power (since that’s the true point of ratings/rankings?
This here is actually a really serious issue with ratings systems that people do attempt to address (look Whole History Ratings), and is part of what RD is supposed to address (the “this is where we think you are” – not “this is what we think how far you can move”)
Sure, and that makes sense, but keep in mind that most ratings systems are, in a way, statistical theories (which is why they always come in academic papers), and thus need some sort of data to back up their claims.
It could be possible that the OGS implementation is not that great (I’d like to have more details from flovo’s implementation), but your problem either lies with the implementation or the theory, and the battle of theory is not an easy one to fight…