简体   繁体   中英

Get all articles Guardian API

I used guardian news api to fetch data. Then it documentation said, results are returned as paginated list of containing, by default, 10 entries per page. And I get output JSON as this. guardian documentation can find here

{
    "response": {
        "status": "ok",
        "userTier": "developer",
        "total": 8174,
        "startIndex": 1,
        "pageSize": 10,
        "currentPage": 1,
        "pages": 818,
        "orderBy": "relevance",
        "results": []
}

I want to colect all data(total of 8174 in example) instace of 10 entities. Is there any way to fetch all data?

I found the answer. Default guardian fetches 10 entries per page. We can override default values using page-size parameter in API and providing needed data count.

https://content.guardianapis.com/search?q={query}&page-size={data count}

Your solution will not work in all cases, since there is usually a limit to the page-size parameter. For the Guardian API this is 200 at the moment.

If you need more items than you can get in a single call to the API, simply iterate over pages with a definite loop (if you know how many pages you need) or with an open-ended while loop if you want to grab everything, eg

current_page = 1
total_pages = 1
while current_page <= total_pages:
   try:
      r = requests.get(url, params)
      r.raise_for_status()
   except:
      SystemExit(err)
   current_page += 1
   total_pages = r.json()['response']['pages']

ps always good to add a way out your while loops if something fails, you don't want to flood the api with requests forever!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM