I used guardian news api to fetch data. Then it documentation said, results are returned as paginated list of containing, by default, 10 entries per page. And I get output JSON as this. guardian documentation can find here
{
"response": {
"status": "ok",
"userTier": "developer",
"total": 8174,
"startIndex": 1,
"pageSize": 10,
"currentPage": 1,
"pages": 818,
"orderBy": "relevance",
"results": []
}
I want to colect all data(total of 8174 in example) instace of 10 entities. Is there any way to fetch all data?
I found the answer. Default guardian fetches 10 entries per page. We can override default values using page-size
parameter in API and providing needed data count.
https://content.guardianapis.com/search?q={query}&page-size={data count}
Your solution will not work in all cases, since there is usually a limit to the page-size parameter. For the Guardian API this is 200 at the moment.
If you need more items than you can get in a single call to the API, simply iterate over pages with a definite loop (if you know how many pages you need) or with an open-ended while loop if you want to grab everything, eg
current_page = 1
total_pages = 1
while current_page <= total_pages:
try:
r = requests.get(url, params)
r.raise_for_status()
except:
SystemExit(err)
current_page += 1
total_pages = r.json()['response']['pages']
ps always good to add a way out your while loops if something fails, you don't want to flood the api with requests forever!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.