Getting access to every sgf on OGS

Hi amazing people!

I’m currently working on a project for which I require an insane amount of SGFs. I started downloading games from the API one by one with a scraper I made. However, I’m getting hit by so many 429s at this point, it’s just too slow.

I’m currently faced with 2 options. Either I stick with this option by adding to the amount of computers that constantly download games or I change the way that I’m doing things.

So I was wondering if there was a way to simply access every single SGF file behind the API and download the lot. Not only would that be simpler, but I’m also starting to feel like I’d be Ddosing OGS to some degree by adding more computers pinging the API non stop.

If you could help me out, I’d really appreciate it.

Thank you very much!
BoneSaw

There’s no source of all SGFs ever (like the one that lichess has), but there are users who have downloaded quite a few.

You can find previous attempts by searching around the forums a bit.

Is there a source for parts of those SGFs? Or just one by one through the API?

I believe it’s a slow process no matter how you do it (people usually sleep a few seconds between API calls to avoid getting an error). However one technique to get game metadata is to use the https://online-go.com/api/v1/players/{player_id}/games endpoint because you can see more games at once. No SGF though.

As far as where you can get the downloaded SGFs, I think users have made their data available in the past, but again, you’d have to search around the forum (or maybe reddit too) a bit.

There are, what, 150,000 professional kifu from serious games?

OGS has over 30,000,000 kifu at this point, so around 200x more. Although most of them are probably 9x9 or man-vs-bot.

1 Like

The 429 errors indicate that the OGS is intentionally rate limiting you to avoid spending too much resources in serving just these requests.

Please don’t just add more computers just to circumvent rate limiting! That would effectively be no different than launching a DDOS attack.

I guess the real question is how to download a ton of SGFs from OGS without going through the API at all, in order to be more efficient for both OGS and the user making this request. As far as I’m aware, there is no publicly accessible repository for this bulk SGF data. It seems conceivable that this could possibly be done by the site operators as a special request, but I would imagine that such a request would be more likely to be well received with some explanation of said project, and how and why so many SGFs would be needed.

5 Likes

Indeed. Maybe if someone offered to host said SGFs in the future, it would be easier to get buy-in from the powers that be :slight_smile:

How big a chunk of data would we be talking, do you think?

Perhaps it could be put on Archive.org or somewhere…

I’m making a project that aims, in the first iteration, to create a repertoire of games. Showing what different levels of players from different eras played. I currently have everything done to ask questions like “In a position p, what were the moves played by white in real life games between the years 1960 to 1980 by single digit kyu players in even games (chinese rules) where there was no more than a stone difference between the two players that resulted in white’s win by points?” or “What are the moves that Oh Yujin 7p played in a position p?”

With Katago, we could get to know really interesting data about different ranks (Ex: how many points on average does a 7k lose per move?)

I do have further ideas on how to expend this project, but that’s the first step. As you can guess, to be relevant, it’d require a metric buttload of SGFs. So far I downloaded about 400 000 games, but I have a lot more to go ^^’

One I get the first step completed, it’ll be a pleasure for me to share that SGF library with anyone who wants it to avoid them having to literally (d)dos OGS to get them

1 Like