This collection has been updated with games from 2021 to 2025-05. The number of games has approximately doubled, from 27 to 56 million games.
It’s available in the same formats as before, on my website (link on first post):
- JSON (one per line) of each game, unmodified from OGS
- SGF files (one per game), in .tar.gz files. There are two downloads – one organized by date, one by player. Bots are excluded from the player collection
- Uploaded games (0.7% of games) are excluded temporarily because I discovered a privacy concern, but they’ll be up eventually
I’ve made a best effort to fix any problems people mentioned to me the first time around. Weird names shouldn’t cause any bugs, and I think ranks in the SGFs should match the ones displayed on OGS now. (“[?]” rank will display as OGS’s best guess at the time)
Feel free to report (or fix) any problems over at GitHub - za3k/ogs: Crawl online-go.com and package up the results for the public , which includes the crawler and SGF converter code.
As before, I probably don’t have a lot of time to answer “how do I…” questions on what to do with the data once you have it. But I’ll keep an eye out if people spot any problems in the actual dataset (missing games, wrong ranks, that kind of stuff).
Everything is a bit more streamlined now, so hopefully I can keep things up to date more going forward.
Thanks to hexahedron, anoek, siimphh for help on this update!
P.S. If a mod could update the top post with the updated year range that would be great (they apply to the za3k.com link but not the internet archive, which is still stuck in 2021 for now). Also copy+paste the github link I guess?
P.P.S. @anoek you or other OGS devs might find this list of interest – all of these database entries appeared to me to have some problem with them ogs/game-blacklist.txt at master · za3k/ogs · GitHub . The most common is a winner who isn’t one of the players.