OGS Pattern Search Database

For my goals I removed all informations about moves and I really don’t know how to handle it.

I know from experience that sometimes we transform informations to make them more useful. A plain SGF sequence is hard to understand by just reading it. But I’m sure it can be changed into something more meaningful. Data can be structured in more complex ways than just mono dimensional lists or bi-dimentional tables.

An “index” is a way of organising data: it’s more data, made up studying original data, that allow to speed up research and query because they already contain part of the answer (which has been computed in advance).
Just as an example, an index could number increasingly records according to an ordering method (names in alphabetical order, games in score order and so on).
That means quicker results (consulting the index) but using bigger amount of data (the indexes themselves).

I don’t know how pattern search works though, and it would be awesome if someone could explain it.

To complement the conceptual explanations with a concrete example, here is a random game from the database:

{"score_stones":false,"auto_scoring_done":true,"allow_ko":false,"private":false,"height":9,"time_control":{"time_control":"byoyomi","period_time":30,"main_time":600,"periods":5,"system":"byoyomi","speed":"live"},"free_handicap_placement":false,"pause_control":{"stone-removal":true},"meta_groups":[192],"moves":[[6,2,9054],[6,6,2012],[2,3,10740],[2,5,3799],[6,4,12602],[5,4,4036],[5,3,7990],[4,4,4366],[4,3,9138],[1,3,7583],[1,2,10026],[1,4,3452],[6,5,8556],[5,6,3935],[3,2,8180],[3,4,1807],[7,5,8715],[7,6,9638],[0,2,8471],[3,3,3047],[7,3,12358],[2,4,2703],[2,2,11099],[8,5,2412],[8,4,11087],[8,6,2183],[0,3,18001],[0,4,2030],[5,5,17883],[4,6,4346],[4,5,18618],[3,6,4227],[3,5,17949],[8,2,1020],[6,1,11730],[8,3,2325],[7,4,8832],[4,2,12255],[4,1,9333],[2,6,7903],[-1,-1,10226],[-1,-1,24739]],"allow_superko":true,"score_passes":true,"clock":{"expiration_delta":518465,"current_player":58542,"title":"Partie amicale","paused_since":1449085915,"black_player_id":58542,"expiration":1449086433872,"last_move":1449085915406,"pause_delta":-406,"white_player_id":182607,"stone_removal_expiration":1449086215704,"game_id":3388758,"now":1449085915704,"black_time":{"thinking_time":368.466,"period_time":30,"periods":5},"stone_removal_mode":true,"white_time":{"thinking_time":490.1820000000002,"period_time":30,"periods":5}},"black_player_id":58542,"winner":182607,"aga_handicap_scoring":false,"white_player_id":182607,"width":9,"score":{"white":{"stones":0,"total":27.5,"handicap":0,"prisoners":0,"scoring_positions":"afbfagbgahbhchdhehfhghhhihaibicidieifigihiii","komi":5.5,"territory":22},"black":{"stones":0,"total":25,"handicap":0,"prisoners":3,"scoring_positions":"aabacadaeafagahaiaabbbcbdbfbhbibecfchcicidgd","komi":0,"territory":22}},"initial_state":{"white":"","black":""},"end_time":1449085915,"score_territory_in_seki":false,"automatic_stone_removal":false,"handicap":0,"start_time":1449085565,"score_prisoners":true,"disable_analysis":false,"allow_self_capture":false,"ranked":true,"komi":5.5,"game_id":3388758,"removed":"ecicid","strict_seki_mode":false,"opponent_plays_first_after_resume":true,"superko_algorithm":"ssk","white_must_pass_last":false,"rules":"japanese","paused_since":1449085915,"players":{"white":{"username":"Grobat","accepted_stones":"ecicid","rank":9,"egf":64.762,"professional":false,"id":182607,"accepted_strict_seki_mode":false},"black":{"username":"Natsu (Fuego)","accepted_stones":"ecicid","rank":8,"egf":-98.811,"professional":false,"id":58542,"accepted_strict_seki_mode":false}},"phase":"finished","game_name":"Partie amicale","score_territory":true,"outcome":"2.5 points","initial_player":"black","history":[],"conditional_moves":{"58542":{},"182607":{}},"pause_on_weekends":false}

Same data separated on different lines to make it easier to read
{
    "score_stones": false,
    "auto_scoring_done": true,
    "allow_ko": false,
    "private": false,
    "height": 9,
    "time_control": {
        "time_control": "byoyomi",
        "period_time": 30,
        "main_time": 600,
        "periods": 5,
        "system": "byoyomi",
        "speed": "live"
    },
    "free_handicap_placement": false,
    "pause_control": {
        "stone-removal": true
    },
    "meta_groups": [192],
    "moves": [
        [6, 2, 9054],
        [6, 6, 2012],
        [2, 3, 10740],
        [2, 5, 3799],
        [6, 4, 12602],
        [5, 4, 4036],
        [5, 3, 7990],
        [4, 4, 4366],
        [4, 3, 9138],
        [1, 3, 7583],
        [1, 2, 10026],
        [1, 4, 3452],
        [6, 5, 8556],
        [5, 6, 3935],
        [3, 2, 8180],
        [3, 4, 1807],
        [7, 5, 8715],
        [7, 6, 9638],
        [0, 2, 8471],
        [3, 3, 3047],
        [7, 3, 12358],
        [2, 4, 2703],
        [2, 2, 11099],
        [8, 5, 2412],
        [8, 4, 11087],
        [8, 6, 2183],
        [0, 3, 18001],
        [0, 4, 2030],
        [5, 5, 17883],
        [4, 6, 4346],
        [4, 5, 18618],
        [3, 6, 4227],
        [3, 5, 17949],
        [8, 2, 1020],
        [6, 1, 11730],
        [8, 3, 2325],
        [7, 4, 8832],
        [4, 2, 12255],
        [4, 1, 9333],
        [2, 6, 7903],
        [-1, -1, 10226],
        [-1, -1, 24739]
    ],
    "allow_superko": true,
    "score_passes": true,
    "clock": {
        "expiration_delta": 518465,
        "current_player": 58542,
        "title": "Partie amicale",
        "paused_since": 1449085915,
        "black_player_id": 58542,
        "expiration": 1449086433872,
        "last_move": 1449085915406,
        "pause_delta": -406,
        "white_player_id": 182607,
        "stone_removal_expiration": 1449086215704,
        "game_id": 3388758,
        "now": 1449085915704,
        "black_time": {
            "thinking_time": 368.466,
            "period_time": 30,
            "periods": 5
        },
        "stone_removal_mode": true,
        "white_time": {
            "thinking_time": 490.1820000000002,
            "period_time": 30,
            "periods": 5
        }
    },
    "black_player_id": 58542,
    "winner": 182607,
    "aga_handicap_scoring": false,
    "white_player_id": 182607,
    "width": 9,
    "score": {
        "white": {
            "stones": 0,
            "total": 27.5,
            "handicap": 0,
            "prisoners": 0,
            "scoring_positions": "afbfagbgahbhchdhehfhghhhihaibicidieifigihiii",
            "komi": 5.5,
            "territory": 22
        },
        "black": {
            "stones": 0,
            "total": 25,
            "handicap": 0,
            "prisoners": 3,
            "scoring_positions": "aabacadaeafagahaiaabbbcbdbfbhbibecfchcicidgd",
            "komi": 0,
            "territory": 22
        }
    },
    "initial_state": {
        "white": "",
        "black": ""
    },
    "end_time": 1449085915,
    "score_territory_in_seki": false,
    "automatic_stone_removal": false,
    "handicap": 0,
    "start_time": 1449085565,
    "score_prisoners": true,
    "disable_analysis": false,
    "allow_self_capture": false,
    "ranked": true,
    "komi": 5.5,
    "game_id": 3388758,
    "removed": "ecicid",
    "strict_seki_mode": false,
    "opponent_plays_first_after_resume": true,
    "superko_algorithm": "ssk",
    "white_must_pass_last": false,
    "rules": "japanese",
    "paused_since": 1449085915,
    "players": {
        "white": {
            "username": "Grobat",
            "accepted_stones": "ecicid",
            "rank": 9,
            "egf": 64.762,
            "professional": false,
            "id": 182607,
            "accepted_strict_seki_mode": false
        },
        "black": {
            "username": "Natsu (Fuego)",
            "accepted_stones": "ecicid",
            "rank": 8,
            "egf": -98.811,
            "professional": false,
            "id": 58542,
            "accepted_strict_seki_mode": false
        }
    },
    "phase": "finished",
    "game_name": "Partie amicale",
    "score_territory": true,
    "outcome": "2.5 points",
    "initial_player": "black",
    "history": [],
    "conditional_moves": {
        "58542": {},
        "182607": {}
    },
    "pause_on_weekends": false
}

So each game has a collection of “properties”, some of which are easier to understand than others. For example, there is a “width” and “height” property which describe the size of the board (the properties are not sorted in any particular order, so width and height are actually not next to each other).

Some properties are just a single value, such as a number ( komi = 5.5 ) or some text ( rules = “japanese” ).

Others are more complicated: the “players” property contains multiple pieces of information about both the black and the white player. In practice, if we had stored this game in a variable called “game”, and we wanted to pick out the username of the white player, we would write something like “game.players.white.username” (the “game” object has a “players” property, which in turn has a “white” property, which in turn has a “username” property). In this case, this would give us back the username “Grobat”.

The property “moves” contains a list of all the moves of the game, with three pieces of information for each. Below I cut out just the first two and last four moves of this game.

"moves": [
        [6, 2, 9054],
        [6, 6, 2012],
        ...
        [4, 1, 9333],
        [2, 6, 7903],
        [-1, -1, 10226],
        [-1, -1, 24739]
    ]

For each move, the first two numbers give the coordinates of the move, and I think the last number is the thinking time in milliseconds. A pass is encoded as a move with coordinates [-1, -1].

Now, if we want to do something like check how often tengen is played on the first move, all we need to figure out is how to tell the computer to go through all the games and check for how many the first entry in the “moves” property has the correct coordinates.

This will look sligthly different depending on the programming language, in Python it looks something like this:

import json
games = 0
with open('database.json') as file:
    for line in file:
        game = json.loads(line)
        if game['width'] == 19 and game['height'] == 19:
            move = game['moves'][0]
            if move[0] == 9 and move[1] == 9:
                games += 1
print(games)

After the computer has chewed through all the 27M games, this program will print out the number of games where tengen was the first move.

So with just a little bit of prior Python knowledge, a question like the above is very easy to answer, but doing pattern search is much more complicated. It’s not immediately obvious from the game data if a given pattern appears in the game: you would need to play through the game and see if the pattern shows up. But actually playing through all games everytime you want to do a search is too slow, so you need to store the games in some clever way to achieve something like waltheri. (I do not know this clever way myself :slight_smile:)

9 Likes

I don’t know how this is generally done, nor have I ever attempted to make something like that, but my first idea is to make use of Zobrist hashing, with perhaps some extra bit fiddling magic for masking to support matching partial positions besides full positions. I don’t know if that could actually work though.

2 Likes

See links in OGS Pattern Search Database - #21 by Uberdude

I made a game explorer years ago that looks through maybe 40,000 games pretty quickly:
http://go-moves-com.securec68.ezhostingserver.com/board.cfm?this=1&solved=0&attempts=0&mvs=&nxt=16160404031617040516

2 Likes

I think if we want it to be viable for such a large-scale database, it might be right to focus primarily on whole-board positions (since we already have a feature in place for joseki), this would allow us to more easily use a zobrist hashing system, although to detect transpositions I’d imagine it’s a good idea to have the “keys” of the hash have the same symmetries as the board (that is, you can efficiently transform a hashed position into the hash for each rotation and reflection of the same position, hopefully faster than doing those transformations and hashing those) just for ease of use (note that assuming the same starting position won’t catch all of these, even for simple situations such as double nirensei, as there are two identical positions that would not be caught by this approach based on the chirality of which star point was played first)

of course, the theoretical loss-less function of this type would be trinary digits arranged in their respective octet of the board with functions that permute these octets in accordance to D8,

but the trouble with trinary is fairly obvious when it comes to this, not to mention the lack of hashing makes the search less efficient, but I think it’s worth bringing up nonetheless

one idea to take this to an extreme is to use the natural translation from trinary to binary using each octet as a single number, concatenating the octets, and then using a random (or less than random) permutation of bits, which satisfies a lot of our properties in regards to permuting the 8 octets, and should distribute things a bit more evenly across the entire string, but has the downside (?) of most likely grouping together games based on the number of moves played so far, as well as not handling color-swapping easily if that’s important (and also you can’t just XOR in a move like you can in a zobrist hashing this way, but with a bit of group theory I think that’s fixable)

2 Likes

Hmm, I wonder whether OGS would mind a bit of data scraping to get an sgf collection.

There’s been a game dump in the past: Can we get an SGF database dump?

Might start with that set, and then contact anoek directly if newer data is needed. It would be a lot more efficient to dump than to scrape.

Edit: ohp I just read through that thread, and it wasn’t really a dump, but it looks like anoek did raise the rate limit for that effort.

1 Like

It would not be, having talked to anoek. It would need a custom dump tool written that doesn’t exist. If you need newer data, contact me directly and I can update that scrape sooner than planned.

3 Likes

See my edit