Ongoing server maintenance

tl;dr - if you get loading issues, try reloading the page. The loading issues should no longer be an issue by later today, I think.


Hello all,

We’re going through an update of our Cassandra cluster, the database system used to store game state as well as serve read requests for various things. Specifically, we’re increasing our replication factor from 2 to 3, unfortunately this means that we need to do a “full rebuild” of all the data stored on each node and that takes hours, while this is going on we’re seeing lots of read errors as nodes think they can now serve data that they don’t yet have.

In theory, this only affects reading of existing data so in theory, reloading should work at least 2 out of 3 times. All writes (ie game moves and updates to anything) should now be written to all three responsible nodes and should seal the updated data so there won’t be any issues going forward loading that particular data.

So, while this is going on, you can expect some troubles loading from time to time, but it should get gradually better as the repair process works it’s way through all the data.

After this is all done, I think this will fix the issues where when one of the nodes fails (which they do a surprising amount) we shouldn’t see games rolling back to previous states and whatnot.

However, today might be a little bumpy, so sorry in advance for the troubles today.

– anoek

22 Likes

Aye aye captain :saluting_face: the seas may be choppy today but I trust you to get us to land.
Thank you!!

9 Likes

Hey @anoek !

Thanks for the notification. Out of curiosity, do you mind sharing some tech details such as the specs of the Cassandra cluster? Curious to see what you have built, I’ve helped running some mongodb clusters in the early 2000’s until data was migrated by another team to Cassandra.

Cheers

2 Likes

Sure, it’s quite small but handles our load just fine: 3 nodes, each node has 2 vcpu’s with 16GB ram, and (now) 300GB SSD storage.

2 Likes

Thanks for getting rid of that frozen move indicator.
:kiss:

I believe we’ve passed the read-error stage, but if anyone has any troubles please let me know!

5 Likes

Not sure if this is related, but I have a persistent indicator in the upper right for it being my turn in a game even when it is not my turn.

When I click it, it takes me to this game (Ladder Challenge: Hoogsteentje(#2659) vs pekoe(#2651)) even if it’s the opponent’s turn.

2 Likes

Thanks for letting me know. That one is unrelated but it’s a recurring issue I know how to fix, I’ll fix it up after the server maintenance is complete (which will be awhile still)

6 Likes

me too

I’m getting errors yesterday and today which I’ve never had before whereby the thing that says how many games it’s my turn in is showing zero, but when I go into my profile I see that in fact it is my turn in one or more games.

I also was yesterday was getting the thing where it kept saying it was my turn even though it wasn’t.

There seems to be one particular game this problem is most persistent with.

1 Like