2025 OGS game dump

Has anyone updated the OGS game dump since 2021? If not, then I’d be happy to catch it up from where it was left off.

Was the approach to hit the termination API for each game ID in sequence and save the resulting JSON objects (for all the ones that are publicly accessible)? So I should find the largest ID from the 2021 dump and just run a huge for loop from that ID onwards?

3 Likes

It would be nice to have another game dump (and an annual update in future years). Lots of scope for doing database searches to see how joseki and fuseki are evolving at “mere mortal” level as opposed to pro level.

1 Like

I’ve started another download from the point @za3k left off, that is from game 36611000. It’s making a bit of progress at 20/s. FYI @anoek, LMK if it’s any trouble.

1 Like

Seems fine so far

3 Likes

Actually, I have it updated to 2023, I just didn’t get around to publishing it. Sorry! I was (maybe false positive) having some issues post-processing the JSON into SGF collections, and then got distracted.

If you bug me again next week (I’m about to drive out of state ATM) I can update it. [of course feel free to download in parallel too] My email is za3k@za3k.com if you want to coordinate.

@anoek, if I can hijack the thread a bit… is there any documentation on what “rank” means, in terms of displaying a kyu/dan rating? And is rank always guaranteed to be at the time of the game played?

1 Like

I think that’s correct in this context

I assume you’re referring to the numeric value and how we turn that into a Kyu/Dan rank, it’s just this:

0 = 30k
1 = 29k

and so on

1 Like

Thanks anoek. I thought I was seeing something else, but probably I was mistaken. Will check when I’m back in state.

@siimphh I’ve resumed downloading from ID=56,130,746 . I’ll post the JSON once I get it. ETA is maybe June 7. (Download first pass, plus a second pass for any games that failed the first time around.) I’ll upload the JSON immediately this time so you have a copy, and only later deal with the SGF generation for the general public.

4 Likes

Hehe, my download had gotten as far as 53518706 but I’ve stopped it for now then and I’ll wait for yours to finish and grab it when ready! Thanks for picking it up again!

For my purposes, I’m definitely more interested in the raw json because I’m fishing out some extra pieces of metadata that don’t typically end up in sgf (ranked vs unranked and broad game clock categories, not using the final board counting result yet but that’s also interesting). So yeah, if you can make a json download available when it’s ready then I’d appreciate it ahead of sgf, even though maybe more people would be looking at the sgf files in general.

I’ll check back second week of June-ish then!

2 Likes

@za3k, how’s the JSON download looking?

Thanks for the reminder. It had completely stalled at 1%, and I have no idea why. I just reset the crawler.

I’m back at home so I’ll try and keep an eye on it more actively

2 Likes

Oh shoot! I’ve started mine up as well, maybe it will also be useful to compare notes later about which games are found and which ones aren’t. So far I had seen stretches of game ID ranges that all return 400, and some individual 409 if I recall correctly. Other errors seem to be retriable.

I’d also set up writing files by game start time and was very surprised to find occasional games with wildly out of sequence start times, by months or even years.

Some games are uploaded pro games. For example there are games whose “start” time is in the 1700s.

Not sure if that’s what you’re seeing, or something else.

Edit: Working fine BTW, at 6% already. Not sure what happened last time.

1 Like

@anoek Has requested that siimphh and I both stop crawling for the time being, because the servers are getting too loaded. I’ve stopped on my end.

@siimphh, let’s make sure at most one of us is crawling OGS at a time once we restart, to avoid the combined load of two crawlers. I’ll let you finish everything you want to do first.

Please coordinate with @anoek to figure out how crawl in a reasonable fashion (endpoint, rate, monitoring to make sure nothing is overloaded, so on).

I’ll keep an eye on this thread and any DMs. Until I hear a go-ahead from both of you, my crawler is stopped.

4 Likes

@siimphh, have you re-started your crawl yet?

1 Like

I started it:

2025-06-18 19:24:56,880 2573174 INFO ogs-get-sgf.main():195: 1000 JSON saved, latest_game_id=54408517

@anoek, is it looking OK?