I would like to retrieve some data from OGS in order to make some chart. For now I would be happy to get my data (games, players involved, who won, board size and so on).
Thanks to @S_Alexander I got the very basic of it:
I can ask the server using a http GET request (never heard of before)
Answer can be in json format (never heard of before)
In order to do this I have to send a sort of URL to the server (wow, sounds easy)
Example of this URL can be found on the apiary (well, some seems broken, see later)
They can be cheched using any browser (this is really easy)
Answers can easily become HUGE
So be careful not to overload server
It could be necessary some sort of authentication to state that youāre not trying to hack or crash the server
Now for the big question: how can I submit this request other than with my browser?
Some uses python, but I donāt.
Iām trying to use FME, which is a software I use at work.
Itās very nice and user friendly and is specific for retrieving, modifying and storing data.
I never used before the FME āHTTP Callerā but I can learn. So I did and it worked!
Now I know how to:
Send an http GET request
flatten the json
extract attributes that I want to
store them in any database format that I like
So why am I here, still asking for help?
Well, Iām a kind of sorcererās apprentice who can easily make disasters, so I would be careful.
Also, I think my story can be useful to others.
But letās focus on the ācarefulā thing!
what can I do not to overload the server? How many data can I retrieve without doing harm (and possibly being kicked off)?
The 54376 pages of the playerās list, for instance, will it be too much?
Also, is it splitted in to pages only for the browser or also via direct GET request?
I dare not to just try because I donāt want too many brooms flooding my cavern.
Can anybody help me?
I think it is a great question to be asking. You recognise the power of that rope you are holding
If the server is properly designed, you should not be able to hurt OGS with a finite number of one-at-a-time requests.
So if you iterate through every player and all their games, asking for the next after you received the previous one, and you stop after that (donāt go on indefinitely) a well designed server should be able to cope. Maybe it slows down a little while thatās happening, but it ends.
What you must not do is go on indefinitely asking for stuff, and even more importantly, you mustnāt ask for lots of stuff in parallel.
For example, if your environment (python - I know what that is - or FME - I donāt know what that is) allows you to make many requests at the same time or quickly in succession before the first one finishes then thatās nasty behaviour. Your numerous requests can swamp the server: this is how ādenial of serviceā attacks work.
If you are asking for a finite amount of stuff, one thing at a time, I would have thought you should be OK.
That being said, it would certainly be the polite thing to do to ask @anoek directly before you decide to ask for āall the dataā, even one request at a time.
But my requests start immediately one after the previous because itās a loop. So, itās 4 requests in few seconds.
If the page_size was lover (say 10), that would be 40 requests in few seconds.
I donāt know which case is better for the server.
Also, Iām not so clever, so I need many trials and errors to become at the end. So today I asked for my data many many times (and still donāt have them!!!).
Everything without authenticationā¦
I feel very dumb now.
if you go to the api root in a browser, you can see the various paths through the api, it is useful for finding what is available and the proper path for the request.
Donāt beat yourself up over it. As long as you donāt ask for anything obviously huge (I donāt know if itās possible but asking for the complete list of players in 1 page would be bad), and as long as you keep the amount of requests per minute to a reasonable number, you should be fine.
I think being careful is a massive great first step.
It is also actually the responsibility of the server to protect itself against silly single requests. So if the server canāt cope with being asked for the complete list of players in one page, it simply shouldnāt honour that request. Typically thatās done with a maximum page size.
The biggest thing to be careful about is being sure that your system is not making many requests at once.
If it genuinely waits for one request to finish before sending the next one, and is not in an infinite loop, then thatās pretty well behaved.
Some software will be able to make lots of requests, wait for them all to return, and assemble the result. Donāt do that to the OGS server
Another good guide is ādonāt do anything that takes a long timeā. If it takes a long time, youāre probably making the server work hard, and that means other people might feel it.
The good news is that unlike Mickey, you can pull the power plug on all those brooms at once if it starts to go wrong. The symptom of go wrong is ātakes a long timeā.
I was going to make a pedantic response about infinity but I feel thatās not really helping anyoneā¦ Iāll just sadly weep about the irrelevance of set theorists to daily life problems instead.
I throttled requests so to make one every 15sec.
I made a loop that uses the ānextā keyword, so to be able to download all data, whatever is the lenght of the page.
I downloaded summary data of all of my games and Iām about doing some stats on it.
Now I have just a big question markā¦
In my profile page I can see:
11 ongoning games
my history made of 38 pages of 10 games each + 7 games in the last page
It sums up to 398
My downloaded games are 400.
So, should I try to find the two ghost games?
((( BTW, G0t Stats says that my total games played on OGS are 371 )))
As you can see, correspondence games can take very long time to finish.
I used Tableau software.
The dashboard on my PC is interactive: I can choose the board size from top right filter and then charts will update, showing just a portion of data.
Itās possible also to publish dashboards online, over Tableau Public portal, so that everybody can navigate and interact with them.
using FME to retrieve tournament data periodically CHECK
using the site data.world to share these data NOPE
using Tableau to make some viz CHECK
post them on Tableau Public NOPE
Thereās still much to do.
Now Iām still exploring data and trying to create some interesting chart.
Points 2 and 4 are completely new to me.
Point 2 requires that I learn some PostGreSQL.
I could try point 4 though, and see what happens.