Now for the big question: how can I submit this request other than with my browser?
Some uses python, but I don’t.
I’m trying to use FME, which is a software I use at work.
It’s very nice and user friendly and is specific for retrieving, modifying and storing data.
I never used before the FME “HTTP Caller” but I can learn. So I did and it worked!
Now I know how to:
Send an http GET request
flatten the json
extract attributes that I want to
store them in any database format that I like
So why am I here, still asking for help?
Well, I’m a kind of sorcerer’s apprentice who can easily make disasters, so I would be careful.
Also, I think my story can be useful to others.
But let’s focus on the “careful” thing!
what can I do not to overload the server? How many data can I retrieve without doing harm (and possibly being kicked off)?
The 54376 pages of the player’s list, for instance, will it be too much?
Also, is it splitted in to pages only for the browser or also via direct GET request?
I dare not to just try because I don’t want too many brooms flooding my cavern.
Can anybody help me?
I think it is a great question to be asking. You recognise the power of that rope you are holding
If the server is properly designed, you should not be able to hurt OGS with a finite number of one-at-a-time requests.
So if you iterate through every player and all their games, asking for the next after you received the previous one, and you stop after that (don’t go on indefinitely) a well designed server should be able to cope. Maybe it slows down a little while that’s happening, but it ends.
What you must not do is go on indefinitely asking for stuff, and even more importantly, you mustn’t ask for lots of stuff in parallel.
For example, if your environment (python - I know what that is - or FME - I don’t know what that is) allows you to make many requests at the same time or quickly in succession before the first one finishes then that’s nasty behaviour. Your numerous requests can swamp the server: this is how “denial of service” attacks work.
If you are asking for a finite amount of stuff, one thing at a time, I would have thought you should be OK.
That being said, it would certainly be the polite thing to do to ask @anoek directly before you decide to ask for “all the data”, even one request at a time.
Don’t beat yourself up over it. As long as you don’t ask for anything obviously huge (I don’t know if it’s possible but asking for the complete list of players in 1 page would be bad), and as long as you keep the amount of requests per minute to a reasonable number, you should be fine.
I think being careful is a massive great first step.
It is also actually the responsibility of the server to protect itself against silly single requests. So if the server can’t cope with being asked for the complete list of players in one page, it simply shouldn’t honour that request. Typically that’s done with a maximum page size.
The biggest thing to be careful about is being sure that your system is not making many requests at once.
If it genuinely waits for one request to finish before sending the next one, and is not in an infinite loop, then that’s pretty well behaved.
Some software will be able to make lots of requests, wait for them all to return, and assemble the result. Don’t do that to the OGS server
Another good guide is “don’t do anything that takes a long time”. If it takes a long time, you’re probably making the server work hard, and that means other people might feel it.
The good news is that unlike Mickey, you can pull the power plug on all those brooms at once if it starts to go wrong. The symptom of go wrong is “takes a long time”.
I throttled requests so to make one every 15sec.
I made a loop that uses the “next” keyword, so to be able to download all data, whatever is the lenght of the page.
I downloaded summary data of all of my games and I’m about doing some stats on it.
Now I have just a big question mark…
In my profile page I can see:
11 ongoning games
my history made of 38 pages of 10 games each + 7 games in the last page
It sums up to 398
My downloaded games are 400.
So, should I try to find the two ghost games?
((( BTW, G0t Stats says that my total games played on OGS are 371 )))
using FME to retrieve tournament data periodically CHECK
using the site data.world to share these data NOPE
using Tableau to make some viz CHECK
post them on Tableau Public NOPE
There’s still much to do.
Now I’m still exploring data and trying to create some interesting chart.
Points 2 and 4 are completely new to me.
Point 2 requires that I learn some PostGreSQL.
I could try point 4 though, and see what happens.