Can we get an SGF database dump?

This sounds a bit rude to me. Is it really so?

I downloaded the 1k sample.
I can’t find a way to easily tell OGS games from uploaded SGF.

I found two games where I can’t read who won from data.
One was an ongoing game at the sampling moment.
The other is an uploaded SGF.
There are 4 uploaded SGF with “Phase”=“PLAY” but are obviously finished.

I wish I could simply discard uploaded SGF, but I don’t know how.


Quickest scored games in that sample (under 1 min):

@za3k Awesome work!

I can’t find a way to easily tell OGS games from uploaded SGF.

@Lys It looks like uploaded SGF games have the original_sgf JSON key, and regular games do not. For example, the first occurrence of this in sample-1k.json is on line 187.

I found two games where I can’t read who won from data.

I’ve had some trouble/experience with that too. What are the game ids?

I don’t think there should be any expectations for this data to be clean and straightforward. It’s simply the internal representation that has been useful for OGS that has changed over time.

1 Like

How many games are played or ended every day?

Yes, just download the SGFs by date, and see how many files are in the folder for some recent day. This is what I would do myself to answer your question. Sorry if I came off as rude, I won’t have time to answer questions about this indefinitely, though.

Since you are having trouble with OGS’s JSON format, I suggest taking advantage of the work hexadron and I did, and looking at the generated SGFs instead. The ones “by date” and “by player” already omit the uploaded SGFs so they are more uniform for users like you. I should probably make a note about the raw JSON format not being useful to most users on the site. If you’re a programmer, you could also take a look at the code which makes SGFs from JSON.


That’s useful, thanks!
I overlooked that. It isn’t a key: it’s the whole body of sgf file. But it’s null for server games, which is good to filter data.

No problem

You are not responsible for issues in OGS data.
A little documentation about your own work would be useful though, at least for those like me who can’t read code.

I hope there’s someone else interested in crawling your database, in order to help each other just like @voltrevo just did above.

Got the 100k sample

Boards used in that sample:


I was really caught off guard by how unexpectedly beautiful this is… thank you.


Looks like there are a lot of regions left to explore! Who’s up for a 16x6 game?


I’m trying to download the full json (18GB).
First attempt by browser failed.
Last night I summoned my old BitTorrent client, which was inactive since many years. Looks like in few hours I’ll have my huge collection for my experiments. :man_scientist:

Edit: download completed.
18 GB compressed, 85 uncompressed.
I will drop almost everything from “original SGF” field. I wonder how big of a DB it will be.


I randomly got curious about how often different intersections are played, so I did some analysis on the 100k sample and got this:

(brighter = played more frequently)

Not anything too surprising here, but still cool to look at I think :smiley:

Extra details

There were 47 602 games on size 19x19 in that sample (I did no filtering except checking board size).

I counted the number of times each intersection was played in total (so an intersection may be played multiple times during one game).
I have not averaged out symmetric moves - you can see that the image is not completely symmetric. It may be transposed and/or flipped vertically compared to the normal board orientation - seeing how small the differences were I didn’t bother to think about orienting it correctly.

Each 1-1 point was played in about 6.6% of the games.
Each 4-4 point was played in about 71.9% of the games.
(these percentages are averaged over all 4 points, and disregard the fact that they are played multiple times in some games)


This would make a cool mosaic pattern… Imagine tiling your bathroom with this :heart:


I think people would complain that your walls are blurry :stuck_out_tongue:


Then those people should start more fights in the centre :sweat_smile: haha

1 Like

I think it would be cool to see what these would look like if only look into the 1st move, first 5 moves, first 10 moves, first 20, etc. Like how fuseki would certainly be around the corners and edges, but mid-game would start to venture into the center. (or even between moves, like between 10 and 20, between 20 and 30, etc. see where the fight usually start in mid-game, even later half of the game see where yose usually are played)


Here are boards after move 10, 20, 30 and 40:


(would be cool to animate, but I don’t have a quick way to do it right now)

First 100 moves:

Only moves 100-109:

Only moves 200-209:


Would be cool to do this by historical period (before shin fuseki, shin fuseki to AI, AI to today)

Feel free to get bored


So nice to see how they visualize. Opening around the 3rd and 4th line, take the corner and then the edge, and mid-game fighting for the high group all the way to tengen, and yose on the very edges, but lots of dames in between.

I wonder if they could be used as a kind of fingerprint and snapshots for different players’ styles. Different styles would have their priority different in different stage of their games, territory players I would imagine stay more on the edge and maybe some 4-6 lines of play early, while fighting style players would run their stones toward the center early and backfill them in mid-game. All players would like to play unconventional styles, etc. (like the guy who always play the Great Wall opening)


Now make it 18x18 :slight_smile:

These are simply beautiful. :heart_eyes:

I’m still struggling with the whole db.
Converting that huge json to db isn’t straightforward.
I’m also discarding all moves and original sgf to keep it as light as possible.


Here are moves 2, 10 and 30 from 39k 9x9 games:
(these are just moves played on that exact movenumber, not moves up until that point)

Interactive diagram for moves 1-100