简体   繁体   中英

Loop for pagination params in requests.get

I want to pars vacancies. And my goal is to pars vacancies just one company

import requests
from tqdm import tqdm_notebook
import pandas as pd
r = requests.get('https://api.hh.ru/vacancies?employer_id=80').json() 
r

If I do so I get by default only 20 vacancies (0 page) though there are 488

'found': 488

and

'page': 0,
'pages': 25,
'per_page': 20

I can make loop

vac = []
for i in tqdm_notebook(range(0, 25)):
    vac.append(requests.get("https://api.hh.ru/vacancies?employer_id=80", params={'page': i}).json())

But I get just 25 vacancies (one for every page). Or I can do

vac = []
for j in tqdm_notebook(range(0, 20)):
    for i in tqdm_notebook(range(0, 500)):
        vac.append(requests.get("https://api.hh.ru/vacancies?employer_id=80", params={'page': i, 'per_page': j}).json())

But this is a very expensive way, we repeat a lot of actions. How to fix it?

You will need to manually set the page and per_page parameters, per the API's documentation . However, you don't need a loop for the per_page parameter - it should be a static number (20):

vac = []
for i in tqdm_notebook(range(0, 25)):
    vac.append(requests.get("https://api.hh.ru/vacancies?employer_id=80", params={'page': i, 'per_page':20}).json())

Also, consider making the range of pages to iterate dynamic based on the first page of pagination results.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM