Api seems to get duplicate game numbers when paging

hi, i started this old thread and sorta got something to work.

i am trying the following python program (please see below), but it seems that subsequent requests for the next page get the same game numbers on that page.

i can see this happening in the browser by using this link and changing the page number.

any pointers will be appreciated.

edit: using wget in the script: below gets 1000 different game numbers.

thanks

> #!#!/bin/sh
> player_id=179
> # step 1
> # read games files and write:  games.<id>.json.page files
> rm game.*.json
> for page in {1..100}
> 	do
> 	echo "page: ${page}"
> 	url=https://online-go.com/api/v1/players/${player_id}/games?ordering=-ended%26ended__isnull%3Dfalse%26page=${page}
> 	echo "url: ${url}"
> 	wget -O "games.${player_id}.json.${page}" ${url}
> 	#curl -o "games.${player_id}.json.${page}" ${url # fails
> 	if [ $? -ne 0 ]
> 	  then echo "error code: $?"
> 	  else echo "ok"
> 	fi
> 	sleep 1
> 	done

gets:

D:\ray\dev\conradapps\scrapogsgames>bash 1.sh
rm: cannot remove β€˜game.*.json’: No such file or directory
page: 1
url: https://online-go.com/api/v1/players/179/games?ordering=-ended%26ended__isn
ull%3Dfalse%26page=1
–2019-10-22 16:35:46-- https://online-go.com/api/v1/players/179/games?ordering
=-ended&ended__isnull=false&page=1
Resolving online-go.com (online-go.com)… 104.25.35.20, 104.25.34.20, 2606:4700
:20::6819:2314, …
Connecting to online-go.com (online-go.com)|104.25.35.20|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: unspecified [application/json]
Saving to: β€˜games.179.json.1’

games.179.json.1 [ <=> ] 23.08K --.-KB/s in 0.004s

2019-10-22 16:35:47 (5.96 MB/s) - β€˜games.179.json.1’ saved [23634]

ok
page: 2
url: https://online-go.com/api/v1/players/179/games?ordering=-ended%26ended__isn
ull%3Dfalse%26page=2
–2019-10-22 16:35:49-- https://online-go.com/api/v1/players/179/games?ordering
=-ended&ended__isnull=false&page=2
Resolving online-go.com (online-go.com)… 104.25.35.20, 104.25.34.20, 2606:4700
:20::6819:2314, …
Connecting to online-go.com (online-go.com)|104.25.35.20|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: unspecified [application/json]
Saving to: β€˜games.179.json.2’

games.179.json.2 [ <=> ] 23.05K --.-KB/s in 0.006s

2019-10-22 16:35:50 (3.54 MB/s) - β€˜games.179.json.2’ saved [23602]

ok

> import requests
> import json
> import time
> player_id="179"
> games={}
> for page in range(1,40):
> 	url="https://online-go.com/api/v1/players/"+player_id
> 	url=url+"/games?ordering=-ended%26ended__isnull%3Dfalse%26page="
> 	url=url+str(page)
> 	print("url:",url)
> 	response = requests.get(url)
> 	if response.status_code!=200:
> 		print("status code:",response.status_code)
> 		continue
> 	d = response.json() # turn page.text into a python object
> 	want=["related","players","id","outcome","black_lost","white_lost"]
> 	for key in d.keys():
> 		if key!="results":
> 			print(key,d[key])
> 		else:
> 			results=d[key]
> 			l=[]
> 			for result in results:
> 				game=result["related"]["detail"].split("/")[4]
> 				#print(game)
> 				l.append(game)
> 			print(l)
> 	#print(json.dumps(d, indent=2)) # pretty print the responces
> 	#pp=json.dumps(d, indent=2)
> 	print("------------------------------------------------------")
> 	time.sleep(1.)

this produces:

(base) D:\ray\dev\chandler\vspygym>py x.py
url: https://online-go.com/api/v1/players/179/games?ordering=-ended%26ended__isnull%3Dfalse%26page=1
count 1847
next https://online-go.com/api/v1/players/179/games?ordering=-ended%26ended__isnull%3Dfalse%26page%3D1&page=2
previous None
[β€˜7893638’, β€˜7894171’, β€˜7894473’, β€˜1103165’, β€˜7934051’, β€˜7936545’, β€˜7936544’, β€˜7918364’, β€˜7931604’, β€˜8050385’]

url: https://online-go.com/api/v1/players/179/games?ordering=-ended%26ended__isnull%3Dfalse%26page=2
count 1847
next https://online-go.com/api/v1/players/179/games?ordering=-ended%26ended__isnull%3Dfalse%26page%3D2&page=2
previous None
[β€˜7893638’, β€˜7894171’, β€˜7894473’, β€˜1103165’, β€˜7934051’, β€˜7936545’, β€˜7936544’, β€˜7918364’, β€˜7931604’, β€˜8050385’]

url: https://online-go.com/api/v1/players/179/games?ordering=-ended%26ended__isnull%3Dfalse%26page=3
count 1847
next https://online-go.com/api/v1/players/179/games?ordering=-ended%26ended__isnull%3Dfalse%26page%3D3&page=2
previous None
[β€˜7893638’, β€˜7894171’, β€˜7894473’, β€˜1103165’, β€˜7934051’, β€˜7936545’, β€˜7936544’, β€˜7918364’, β€˜7931604’, β€˜8050385’]

url: https://online-go.com/api/v1/players/179/games?ordering=-ended%26ended__isnull%3Dfalse%26page=4
count 1847
next https://online-go.com/api/v1/players/179/games?ordering=-ended%26ended__isnull%3Dfalse%26page%3D4&page=2
previous None
[β€˜7893638’, β€˜7894171’, β€˜7894473’, β€˜1103165’, β€˜7934051’, β€˜7936545’, β€˜7936544’, β€˜7918364’, β€˜7931604’, β€˜8050385’]

The url in your python script is wrong.
if you change

url=url+"/games?ordering=-ended%26ended__isnull%3Dfalse%26page="

to

url = url + "/games?ordering=-ended&ended__isnull=false&page="

it’ll work

Here my version of your script:

import requests
import time

player_id = "179"
games = {}

url = "https://online-go.com/api/v1/players/{player_id}/games?ordering=-ended&ended__isnull=false&page_size=100".format(player_id=player_id)

while url is not None:
    print("url:", url)
    response = requests.get(url)
    if response.status_code != 200:
        print("status code:", response.status_code)
        break
    d = response.json()  # turn page.text into a python object
    url = d["next"]
    want = ["related", "players", "id", "outcome", "black_lost", "white_lost"]
    for key in d.keys():
        if key != "results":
            print(key, d[key])
        else:
            results = d[key]
            l = []
            for result in results:
                game = result["id"]
                # print(game)
                l.append(game)
            print(l)
    # print(json.dumps(d, indent=2)) # pretty print the responces
    # pp=json.dumps(d, indent=2)
    print("------------------------------------------------------")
    time.sleep(1.)

I increased the page_size to 100, so fewer requests are needed,
acquire the game_id directly, and
only do as many requests as needed

4 Likes

this works like a charm.
i am now getting 1831 games.
thanks!

2 Likes