简体   繁体   中英

How to loop a GET request in Python to receive all data from a paginated API in a dataframe

I need to pull all the data from an API with Python, but each page only contains 100 results and I can't determine how to use a WHILE loop to return every page so I can put all the data in a single dataframe. The api is set up in the url format, " https://www.api.url.com/sessions?apikey=xxxx&apisecret=xxx&fromdate=2018-11-11&todate=2019-01-31&page=1&country=US "

Both page and country are optional parameters.

I tried altering the api url so "page=1:160" and "page=1-160" but it only returned the first page.

Then I tried adding the page parameter as a separate statement, which returns a name error;

IN:

response = requests.get("https://www.api.url.com/sessions?            
apikey=xxxx&apisecret=xxx&fromdate=2018-11-11&todate=2019-01-31", 
params={'page': page}))
data = response.json()
df=pd.DataFrame(data['Sessions'])
pd.options.display.max_rows = 2000

OUT:

NameError name 'page' is not defined

Next, I tried running the same code but starting with the API format the owner specified, but received a very similar error message;

IN:

r_sessions = requests.get("https://www.api.url.com/sessions?            
apikey=xxxx&apisecret=xxx&fromdate=2018-11-11&todate=2019-01-31").json()
num_pages=r_sessions['last_page']
for page in range(2, num_pages + 1):
r_sessions = requests.get("https://www.api.url.com/sessions?            
apikey=xxxx&apisecret=xxx&fromdate=2018-11-11&todate=2019-01-31", params={'page': page}).json()
print(r_sessions['page'])

OUT:

KeyError 'last_page'

I expected to get a dataframe that contained all the results from the API, even though they were paginated. However, I can only get a maximum of one page per API call at a time. I know I need to loop it and I don't know how since I don't know how many pages there are.

  1. The page is expected as while defining the dict you have used single quotes on the key and not on the value hence the error. Without single or double quotes python considers that as a variable which is not defined

correct code without page error:

response = requests.get("https://www.api.url.com/sessions? 
apikey=xxxx&apisecret=xxx&fromdate=2018-11-11&todate=2019-01-31", 
params={'page': 'page'})) 
data = response.json()
df=pd.DataFrame(data['Sessions'])
pd.options.display.max_rows = 2000
  1. The r_session key doesn't exist in the json that was received. You can use the below simple code to check the top level key in the json.

r_sessions = requests.get("https://www.api.url.com/sessions?apikey=xxxx&apisecret=xxx&fromdate=2018-11-11&todate=2019-01-31").json() for i in r_sessions: print(i)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM