How is WHR Adapted to Handicap Games?
I originally posted this on the Arimaa Forum, but the concept equally applies to adapting WHR to handicap games (or reverse komi games) in Go. Feel free to reply on either platform; I will repost any content-heavy posts (with credit, of course) to the Arimaa Forum post.
How to Even Start?
Arimaa has 864 possible handicaps including the even game. Do each of the handicaps get a rating? Probably. How is it calculated? WHR as for the player ratings? Or would Baysian Elo be better since one does not expect the value of a given handicap when played at the same rating (handicaps will not have the same value in WHR rating points across different skill levels) to ever change, unlike player ratings?
Presumably each handicap might be associated with some function(rating) = handicapValue
where “rating” is the average of the two players ratings, and “handicapValue” is the number of WHR rating points would be added to the rating of the player being given the handicap in order for the WHR system to correctly predict the win chance of both players without knowledge of the handicap? Thus when any two players play, the system would get their average/median rating, and plug it into the associated functions of each (optimizations definitely possible) handicap, and suggest the handicap for which the rating of the weaker player plus the result of the suggested handicap’s associated function is a close as possible to the rating of the higher ranked player?
Processing Games
Player Ratings
Once the game is played (and it might have been played at any handicap at all: even one wildly inappropriate like me giving browni3141 (the strongest Arimaa player today) 7 rabbits) two systems must appropriately handle the result. The normal WHR would need to take into account that the expected result has been influenced by the used handicap: and that handicap may not be at all what the system suggested: the system can not rely on all games being roughly even.
I am still struggling to grok or even understand WHR (and, more generally, Bayes’ Theorem), but my intuition says that this particular adjustment would be fairly trivial for someone who grokked vanilla WHR.
Handicap Ratings
The handicaps must also be adjusted, though. I feel that me trying to think of how to do this is like someone who has no idea how baseball works except that it uses a baseball bat, and deciding to try and run around clocking people over the head with it because maybe that’s how baseball works? Is there a parallel WHR system? Is it all the same system? are the rating points for the handicaps in a 1:1 correspondence with the rating points difference which they cover? If so, how is the different worth of the same handicap across skill levels handled? If not, how are the handicap ratings converted to and from suggested handicaps? How does one prevent degenerate behavior with two systems influencing eachother: seems like a recipe for feedback loops. Is WHR used or Baysian Elo or something else entirely?
Handicap Relative Size
Presumably a solid system such as WHR or Baysian Elo would correctly evaluate the strength of different handicaps relatively quickly and accurately, but with 864 handicaps to rate, most of them absurd, and no domain-specific knowledge of their relation to eachother before WHR starts to work its magic one game at a time, I fear it would take a lot of seemingly random suggested handicaps before the system learned that 7 rabbits 2 horses and an elephant ([7, 0, 0, 2, 0, 1]
)is not much more appropriate a handicap for most matchups than 6 rabbits 2 horses and a camel ([6, 0, 0, 2, 1, 0]
) would be. But some handicaps, such as the two seen above, are just straight up better than others. I do not know how or even if such knowledge could be used to bootstrap the system (maybe setting the initial ratings roughly or seeding the system with some sample games demonstrating the relative strengths of handicaps for the initial prior?) but the following metric seems like it would go a long way towards reducing the number of games which would need to be played against crazy handicaps before the vast majority of them gained enough rating to virtually never be seen suggested, and to be decently accurate when they were used anyway:
let h1, h2 be handicaps denoted by arrays a1[6], a2[6] respectively.
let subtraction of two arrays be defined as returning an array with length equal to the longer of the two input arrays (extra 0s being imagined to be appended to the shorter array) with out[n] = arrayIn1[n] - arrayIn2[n] for each n in out[]. I don’t know if that makes sense as subtraction, but call it something else if you like; this definition is useful here regardless of what it’s called.
let tempArray = a1 - a2
for (int x = 5; x >=0; x--) {
if tempArray[x] != 0 {
if tempArray[x] < 0 {
break; // (not sure if break works like this; but this is all pseudo-code at best anyway): exit for loop: h1 cannot be said to be strictly better than h2 by this metric.
} else {
tempArray[x] = tempArray[x] - 1;
for (int y = x; y <= 0; y--) {
if tempArray[y] < 0 {
tempArray[y] = tempArray[y] + 1;
break; // again, probably wrong syntax: my intent is to break out of the inner for loop but not the outer here
} // end if
} // end inner for
x++; // counteract the decrementing of the counter because I shouldn't have used a for loop here. :D
} // end < 0 if-then
} // end != 0 if-then
} // end outer for
There’s definitely a cleaner way to do this (syntax highlighting and debugging might help, lol) but the idea is that once it exits, if tempArray has no negative values, then h1 can be said to be strictly better than h2 by this metric. It of course doesn’t work if h1 = h2, but why would you test that?
If applied to every possible pair of handicaps, this will create a lot of directed paths between handicaps where the arrow is pointing towards a handicap strictly smaller than the handicap the arrow originates from. Since all this is transitive, redundant arrows can be skipped. Perhaps this can somehow expedite the process of rating the handicaps?
Conclusion
Does anyone have any idea how to do this? I’m sure someone must have implemented it, maybe for Go handicap stones or something, but I can’t find anything by googling. Any links to good explanations of WHR would also be awesome if you have them on hand; maybe something will click with me. How did everyone else begin to grok WHR? Would going line by line through one of the implementations of it (it seems there’s a Ruby one and a Python adaption of the same one on github) be helpful, or would I just be wasting my time if I don’t understand the math from Column’s paper first?