Python student project - data analysis

Hi all,

I am a python student working on a student data analysis project. I think the query to all OGS games and analyze the overall is a nice practice. I am very familiar with python now but not so much with javascript.

I found that what I am trying to do was already done by:

G0tstats is back! (with more stats)

But by looking into the source code of the gotstats project, I still don’t understand how the intial requests were made.

It appears that the useful link is:
https://online-go.com/api/v1/megames
and
https://online-go.com/api/v1/players/308797/full

Download user history - OGS Development - Online Go Forum (online-go.com)

Could anyone please teach me a little about how to query the api and get the json that is essentially the same as what you would be downloading from:

https://avavt.github.io/gotstats

Another thing is it appears that all the records are public. I do not need to generate a client, or log in to use the api? Because if I go to Got Stats?, all i need is anyone’s player ID?

this is related to Ogsapi Ogsapi - A Python API Wrapper for OGS, which seems to be much more complicated and does require a log in. Thanks for your education.

Lastly gotstats generated 2900 games while on ogs profile page I only have 2700 games… I don’t know why

1 Like

I cant speak much on your main issue, but im the maintainer of ogsapi, and you are correct that it requires an account, thanks for mentioning it, i hadnt considered unauthed requests. I can try and see about updating in the next few days to allow for the requests that dont require auth to be called.

However, if you do plan on grabbing all games from OGS, you might want to use the termination api: Can we get an SGF database dump? - #6 by anoek

And maybe let anoek know that youll be hammering the api for all games?

1 Like

For the rest-API you don’t need a login or oauth token.
If you try to download the public games of a player, you are good there.

1 Like

Is the point of your project to learn how to make these API queries or just to do data analysis on a large dataset? Does the dataset need to include practically all available games up to now? Or would just a ton of games up to late 2021 suffice?

If the latter is the case for both questions, maybe this would be helpful:

which is further discussed in this thread:

1 Like

Thank you! That is what I figured. Just don’t know how to make the query to get a json file of ALL of ONE player’s games using python → requests

Thank you as well. Actually from the Got Stats? project, I can download a json file. And that json file is what I need for further analysis. And I am good to go using that json file. I can use python → json module for all kinds of data manipulation.

However I didn’t know - and still don’t know - how to get that file. Like @flovo has mentioned, I don’t know how to query the API using python. If I want to get a full game record, how? Elif I just want a sub list of certain year? Elif I want a record of player’s rank history? etc. So the question remains on the query side, while I think I have progressed quite a lot in the data analysis side.

https://online-go.com/api/v1/players/308797/games?page_size=100

The json has an entry next which holds the url to the next page if there are more games.

If you are interested in the individual moves made in those games, you have to use the termination api for the games you found here.

Wow, that’s exactly what I need. Wish there is a documentation about different kinds of queries and brief explanations like you did. I looked long and hard before deciding to post here cuz I don’t want to trouble people if I can find them. Thank you!

P.s. can you point me to where I can find “termination” related docs?

Actually now you mentioned it, I found there are actually a lot of descriptions here:

My bad I didn’t know how to use it

^^ has links to more current documentation

1 Like

Thank you all.

One last thing I couldn’t figure out. Using the api link for random player:
https://online-go.com/api/v1/players/{}/games/?page_size=100

player #1110905, my query result is 150 total game while the ogs profile page lists only 119
player #1301880, my query result is 284 total game while the ogs profile page lists only 237

I thought it’s becaues there are hidden private games but I checked my own games when logged in:
my query result is 3771 (WOW) total game while the ogs profile page lists only 2777

while there is less games on ogs public pages than api results?

Where do you get the number of games from? The API call returns unranked games as well. The charts on the profile page only take ranked games into account.

1 Like

aha, I think you called it. Compare these two:

Player Games – Django REST framework (online-go.com)

notice the ‘count’:258

Live to serve (online-go.com)

notice the 14 ranked…

now I don’t know how to make query to get just unranked or canceled, etc. will need to look into that.

obviously i can get a huge json first and process everything in python afterwards but if the query can get a clean/filtered json in the first place that will certainly save a lot of time

You can filter the query for most properties in the games json

Unranked: https://online-go.com/api/v1/players/308797/games?page_size=100&ranked=false

This includes games initial created unranked. Cancelled ranked games are still listed as ranked.

Anulled games: https://online-go.com/api/v1/players/308797/games?page_size=100&annulled=true

Canceled: https://online-go.com/api/v1/players/308797/games?page_size=100&outcome=Cancellation

Hello!

You can check the list of OGS APIs here: https://ogs.docs.apiary.io/

Main thing to note here is this is the same API that OGS’s front end is using. So basically anything that can be done on the website, it can be done via one of these API. In fact I think anoek made this with the intention to support developers of mobile apps.

Unfortunately we don’t have full params list for each API so much of the work is reading the FE’s source code and/or guess work.

Some endpoints are public and thus don’t require an Authorization header (Check the #Authorization section)

Generally it’s pretty simple to know: if an API involve making an action for an user (Play a move, resign, enter a tournament) then it’s not public. Otherwise they’re public.

I guess it’s because of the different parameters between the FE and my page.

Gotstats:
https://online-go.com/api/v1/players/197819/games/?page=1&page_size=50&ended__isnull=false&ordering=-ended&annulled=false
OGS:
https://online-go.com/api/v1/players/999999/games/?page=1&page_size=10&ended__isnull=false&ordering=-ended&source=play

I usually see the opposite situation: because Gotstats only check for annulled=false games, the result count on Gotstats is usually smaller than OGS.

For example my Chinitsu account has 225 games on OGS but only 215 of them are shown on Gotstats.

I would suppose your 200 games differences are from OGS’s source=play param. But I don’t know what that param means though. It wasn’t there before afaik haha :sweat_smile: Will need some investigation.

@flovo
Thank you thank you thank you. Your education was always very concise and on point. Can’t thank you enough in this thread.

Thank you @Chinitsu . The API doc I went through didn’t help me that much like people in this thread did. Lol. I wish they have better documentation but people are so kind and helpful here .

Further parsing the json shows that the Cancellationand annulled are related, meaning these 2 always come together:

“outcome”: “Cancellation”,
“annulled”: true

But ranked is not related to either. meaning both these are possible:

{
“ranked”: true,
“outcome”: “Cancellation”,
“annulled”: true
},
{
“ranked”: false,
“outcome”: “Cancellation”,
“annulled”: true
}

Gotstats eliminated all annulled games, therefore all Cancellation games.

Lastly the profile page on ogs e.g. “Total of: 2778 ranked games” means:
ranked=true&annulled=false

ran my python script to pull and count all the records and confirmed the number 2778 is correct

interesting to learn more in the analyzing process.

may i ask what these params are:

ended__isnull=false,
ordering=-ended,

i cannot figure them out cuz they are not in the json

ended is a date. If the game has ended then this field will be the time when it has ended. If it’s null then the game has not ended.

ended__isnull=false just mean ended != null.

ordering=-ended means descending ordering by ended. (odering=ended would be ascending)

About your further up post:

Cancelled game will be annulled, but annulled games could be from several reasons not just cancellation.

For example someone cheated/sandbagged and the moderators annulled the result of a finished game afterward. Then the game will be annulled, but not cancelled.

Speaking from “senpai experience”: This is the best documentation quality you will ever see for, like, your entire career :stuck_out_tongue:

1 Like