Server upgrade issues


#1

Dear OGS,

The recent update, geared at improving site stability, triggered a problem wherein our gameservers began to overload our storage servers. I’ve since throttled that back and subsequent restarts no longer do this, however the effect of this problem caused game load failures, which triggered our migration code designed to allow us to smoothly transition from the old storage platform to the new one. This has caused a recurrence of the bug that we saw 3 days ago where a number of correspondence game 3 days ago timed out because of a very different storage system failure. Unfortunately, in first case I was able to restore games to their state because the data was simply not accessible during the time of the migration, however in this case, the old data was re-migrated and overwrote the current data, and I will not be able to recover the lost moves.

This is a terrible failure on my part, and I apologize profusely. I’ll be working through the night to accelerate the migration so we can get rid of the problematic migration code once and for all.

- anoek


7 Moves eaten by update?
Games Lost by Timeout due to Upgrade
Vacation
Games Lost by Timeout due to Upgrade
Game rolled back from move 20 to move 6
#2

#3

What do we do about corrupted games? I’ve had two games get rolled back to very early on. Was about to close out a couple of wins. Can we request that those games be anulled?


#4

Thats bad news.
I hope you get it done so youll be able to get some sleep again soon :sleepy:

best of luck!


#5

Thanks for the update @anoek Hopefully everyone recognizes the hard work you’ve put in to this upgrade. I think folks may have been spoiled by the overall general stability the last few years. I’m comfortable with some hiccups now, for what I expect will be a rock solid server later.


#6

Sorry to hear that. If the game results cannot be changed, can we at least give the correspondence tournament directors the ability to re-qualify players? Many players are unfairly disqualified from the tournaments now and allowing requalification is a simple way to greatly reduce the frustration if nothing else could be done.

Don’t stay up too late though:)


#7

Can’t speak for everyone else but I love this site/community and I really appreciate the work you put into maintaining and upgrading it.


#8

Hi @kennyjay, [quote=“kennyjay, post:3, topic:10772”]
Can we request that those games be anulled?
[/quote]

Being one of the less active mods, I don’t know whether this will be feasible …

I personally would perhaps even go for a—probably very unpopular—solution, namely: annulling all running games and restarting them, maybe even starting with a totally clean (i.e. empty) game database … and everybody just beginning with the last rank they had … might save many people from more frustration while the last bugs are being ironed out. (If you don’t like this idea, relax, I was just playing out a variation :wink: )


One structural problem, I think, is simply that too few people joined the beta test … most want to keep on ranking up instead of playing on a server where any progress can be erased at any moment when the beta server is reset.


Games Lost by Timeout due to Upgrade
Game rolled back from move 20 to move 6
#9

I realize that no one could predict that the server upgrade would have so many issues, and I appreciate everyone’s hard and unpaid work behind the scenes, both on creating the upgrade and on trying to debug it. And I suspect that restarting all the affected games isn’t a feasible option at this point. But in those games that erroneously timed out, could you perhaps split the baby? Keep (i.e. count) the win for the erroneous (or at least premature) winner of those timed out games, but not count the loss for the losers? If I only had a couple of games affected, I wouldn’t have cared much either way, but I was erroneously timed out of 53 games.


#10

That you don’t have database backups is very troubling.


#11

Hey, I know it’s frustrating for some, but geez, all in all it’s just a few games (and even if we have different ideas about what few means - they are still just games.). No harm done, no need to cancel your beauty sleep, and nothing to be stressing about too much. :slight_smile:

Climbing back a few ranks can be a good practice :stuck_out_tongue:


#12

Wouldn’t a cleared game database also mean losing the numerous demo games too? If it comes to that, can there PLEASE be at least a warning ahead of time so we can download sgfs of games we wish to keep to ourselves?

— actually, going to do that now on some games. Maybe I should not treat OGS as a game database, on the other hand…


#13

:bangbang: Yup, as I’d suggest it, we’d clear everything except for current ranks :smiley:

:construction::construction::construction::construction::construction::construction::construction::construction::construction::construction::construction::construction::construction::construction::construction:

But rest easy, I have nothing to do with the programming :wink: That was just a naïve idea to show that I don’t really care about the past (and thus about the past games) when OGS is the best place to play Go.

:construction::construction::construction::construction::construction::construction::construction::construction::construction::construction::construction::construction::construction::construction::construction:


#14

Well if that’s the only way to resolve the current situation, I don’t mind. It’s better than having too many games to play and many of them losing by timeout due to the bugs. I just wonder why this kind of risks were not considered before the upgrade…


#15

I don’t think that wiping out all games is good idea. I suggest to annul bugged games and change of rating caused by bugged games. It is not urgent, it could be done in few days or even weeks. You can’t change result of games, but I hope you are able to fix consequences.


#16

Can i have my timeout penalty removed?


#17

I have a thought, if OGS is using something like a MySQL database, a dirty database fix could work:

UPDATE games SET status=‘unfinished’, timeout = false, time1 = max_allowed_time, time2 = max_allowed_time WHERE game_id IN (Select game_id from games where status = ‘finished’ and timeout = true and end_date >= ‘02-13-2017’)

That way, all timeout games after the server update could be continued from the timeout point. (I’m not an expert on SQL but you get the idea)

Also, reverting the tournament status in the database is not impossible as well (you just need to use a little more complicated SQL). If you’d like any help, I’m more than willing to help cleaning up the wrong results.


#18

There is a backup, but the good backup is a few days old apparently. The storage instability may be preventing a new clean backup.

I’ve met the Devs and they are professionals who have a high standard for their work, They will make the server rock solid again. This update is designed to reduce the risk of future upgrades. I cannot image anyone is more upset by the current situation than they are.

I’m kicking myself for not finding the time to test the system on Beta.


#19

Time for all of us to exercise our memory. Just imagine somebody knocked over your go board! :slight_smile:


#20

lol… or 53 of them :smile:. sorry @DK1.