简体   繁体   中英

Looping through multiple API links to get data? Seems to be bringing back data from one link

Context

So I have scraped API I found on a website but it only returns 100 data points. I got the data/API request via the:

'Inspect' aspect of the Chrome browser -> Network -> XHR ( http://www.fao.org/faostat/en/?#data/QC ).

There are 1000s. I realised that the data only shows 100 and has data in other pages so I decided to get the API urls for the other pages, place them in a list and call for them. Was not expecting much but the request went through.

I then converted the data into a json format and then a pandas dataframe. I checked the info and found only 100, the original amount was return though it did return a different data set (page 2 data).

So I thought to create a function, define all the API urls and then create a for loop for the function.

It still returns just one page data.

This is the final code:

url = "http://fenixservices.fao.org/faostat/api/v1/en/data/QC?area=81&area_cs=FAO&element=2312%2C2510%2C2413&item=800%2C221%2C711%2C515%2C526%2C226%2C366%2C367%2C572%2C203%2C486%2C44%2C782%2C176%2C414%2C558%2C552%2C216%2C181%2C89%2C358%2C101%2C461%2C426%2C217%2C591%2C125%2C378%2C265%2C393%2C108%2C531%2C530%2C220%2C191%2C459%2C689%2C401%2C693%2C698%2C661%2C249%2C656%2C813%2C195%2C554%2C397%2C550%2C577%2C399%2C821%2C569%2C773%2C94%2C512%2C619%2C542%2C541%2C603%2C406%2C720%2C549%2C103%2C507%2C560%2C242%2C839%2C225%2C777%2C336%2C677%2C277%2C780%2C310%2C263%2C592%2C224%2C407%2C497%2C201%2C372%2C333%2C210%2C56%2C446%2C571%2C809%2C671%2C568%2C299%2C79%2C449%2C292%2C702%2C234%2C75%2C254%2C339%2C430%2C260%2C403%2C402%2C490%2C600%2C534%2C521%2C187%2C417%2C687%2C748%2C587%2C197%2C574%2C223%2C489%2C536%2C296%2C116%2C211%2C394%2C754%2C523%2C92%2C788%2C270%2C547%2C27%2C30%2C149%2C836%2C71%2C280%2C328%2C289%2C789%2C83%2C236%2C723%2C373%2C544%2C423%2C157%2C156%2C161%2C267%2C122%2C305%2C495%2C136%2C667%2C826%2C388%2C97%2C275%2C692%2C463%2C420%2C205%2C222%2C567%2C15%2C137%2C135&item_cs=FAO&year=1961%2C1962%2C1963%2C1964%2C1965%2C1966%2C1967%2C1968%2C1969%2C1970%2C1971%2C1972%2C1973%2C1974%2C1975%2C1976%2C1977%2C1978%2C1979%2C1980%2C1981%2C1982%2C1983%2C1984%2C1985%2C1986%2C1987%2C1988%2C1989%2C1990%2C1991%2C1992%2C1993%2C1994%2C1995%2C1996%2C1997%2C1998%2C1999%2C2000%2C2001%2C2002%2C2003%2C2004%2C2005%2C2006%2C2007%2C2008%2C2009%2C2010%2C2011%2C2012%2C2013%2C2014%2C2015%2C2016%2C2017%2C2018%2C2019&show_codes=true&show_unit=true&show_flags=true&null_values=false&page_number=1&page_size=100&output_type=objects"
url_2 ="http://fenixservices.fao.org/faostat/api/v1/en/data/QC?area=81&area_cs=FAO&element=2312%2C2510%2C2413&item=800%2C221%2C711%2C515%2C526%2C226%2C366%2C367%2C572%2C203%2C486%2C44%2C782%2C176%2C414%2C558%2C552%2C216%2C181%2C89%2C358%2C101%2C461%2C426%2C217%2C591%2C125%2C378%2C265%2C393%2C108%2C531%2C530%2C220%2C191%2C459%2C689%2C401%2C693%2C698%2C661%2C249%2C656%2C813%2C195%2C554%2C397%2C550%2C577%2C399%2C821%2C569%2C773%2C94%2C512%2C619%2C542%2C541%2C603%2C406%2C720%2C549%2C103%2C507%2C560%2C242%2C839%2C225%2C777%2C336%2C677%2C277%2C780%2C310%2C263%2C592%2C224%2C407%2C497%2C201%2C372%2C333%2C210%2C56%2C446%2C571%2C809%2C671%2C568%2C299%2C79%2C449%2C292%2C702%2C234%2C75%2C254%2C339%2C430%2C260%2C403%2C402%2C490%2C600%2C534%2C521%2C187%2C417%2C687%2C748%2C587%2C197%2C574%2C223%2C489%2C536%2C296%2C116%2C211%2C394%2C754%2C523%2C92%2C788%2C270%2C547%2C27%2C30%2C149%2C836%2C71%2C280%2C328%2C289%2C789%2C83%2C236%2C723%2C373%2C544%2C423%2C157%2C156%2C161%2C267%2C122%2C305%2C495%2C136%2C667%2C826%2C388%2C97%2C275%2C692%2C463%2C420%2C205%2C222%2C567%2C15%2C137%2C135&item_cs=FAO&year=1961%2C1962%2C1963%2C1964%2C1965%2C1966%2C1967%2C1968%2C1969%2C1970%2C1971%2C1972%2C1973%2C1974%2C1975%2C1976%2C1977%2C1978%2C1979%2C1980%2C1981%2C1982%2C1983%2C1984%2C1985%2C1986%2C1987%2C1988%2C1989%2C1990%2C1991%2C1992%2C1993%2C1994%2C1995%2C1996%2C1997%2C1998%2C1999%2C2000%2C2001%2C2002%2C2003%2C2004%2C2005%2C2006%2C2007%2C2008%2C2009%2C2010%2C2011%2C2012%2C2013%2C2014%2C2015%2C2016%2C2017%2C2018%2C2019&show_codes=true&show_unit=true&show_flags=true&null_values=false&page_number=1&page_size=100&output_type=objects" 

def get_data(i):
    payload={}
    headers = {
      'Connection': 'keep-alive',
      'Accept': '*/*',
      'User-Agent': (user-agent inserted here),
      'Origin': 'http://www.fao.org',
      'Referer': 'http://www.fao.org/',
      'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8'
    }
    
    r = requests.get(i, headers=headers)
    return r

all = [url, url_2]

for l in all:
    get_data(l)


# Save the data in a json format
Data = r.json()

# See what we have scraped
Data.keys()

df = pd.json_normalize(Data['data'])

df.info()

I also tried it without return r. I tried other for loops like

for x in range(2):
    get_data(url)
    get_data(url_2)

Problem How could I do a for loop to GET the data from multiple pages on the same webpage from the website? The only alternative I see is creating new cells with different links.

You are not getting the data returned by the function, you can achieve that by doing

data = []
for l in all:
  x = get_data(l).json()
  data.append(x)

or just

data = [get_data(l).json() for l in all]

note: avoid using all as a name for variables as it is a function in the global scope.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM