简体   繁体   English

循环通过多个API链接获取数据? 似乎从一个链接带回数据

[英]Looping through multiple API links to get data? Seems to be bringing back data from one link

Context语境

So I have scraped API I found on a website but it only returns 100 data points.所以我刮掉了我在网站上找到的 API,但它只返回 100 个数据点。 I got the data/API request via the:我通过以下方式获得了数据/API 请求:

'Inspect' aspect of the Chrome browser -> Network -> XHR ( http://www.fao.org/faostat/en/?#data/QC ). Chrome 浏览器的“检查”方面 -> 网络 -> XHR ( http://www.fao.org/faostat/en/?#data/QC )。

There are 1000s.有1000个。 I realised that the data only shows 100 and has data in other pages so I decided to get the API urls for the other pages, place them in a list and call for them.我意识到数据只显示 100 并且在其他页面中有数据,所以我决定获取其他页面的 API url,将它们放在列表中并调用它们。 Was not expecting much but the request went through.没有期待太多,但请求通过了。

I then converted the data into a json format and then a pandas dataframe.然后我将数据转换为 json 格式,然后转换为 pandas dataframe。 I checked the info and found only 100, the original amount was return though it did return a different data set (page 2 data).我检查了信息,发现只有 100 个,虽然它确实返回了不同的数据集(第 2 页数据),但原始金额已返回。

So I thought to create a function, define all the API urls and then create a for loop for the function.所以我想创建一个 function,定义所有 API url,然后为 function 创建一个 for 循环。

It still returns just one page data.它仍然只返回一页数据。

This is the final code:这是最终代码:

url = "http://fenixservices.fao.org/faostat/api/v1/en/data/QC?area=81&area_cs=FAO&element=2312%2C2510%2C2413&item=800%2C221%2C711%2C515%2C526%2C226%2C366%2C367%2C572%2C203%2C486%2C44%2C782%2C176%2C414%2C558%2C552%2C216%2C181%2C89%2C358%2C101%2C461%2C426%2C217%2C591%2C125%2C378%2C265%2C393%2C108%2C531%2C530%2C220%2C191%2C459%2C689%2C401%2C693%2C698%2C661%2C249%2C656%2C813%2C195%2C554%2C397%2C550%2C577%2C399%2C821%2C569%2C773%2C94%2C512%2C619%2C542%2C541%2C603%2C406%2C720%2C549%2C103%2C507%2C560%2C242%2C839%2C225%2C777%2C336%2C677%2C277%2C780%2C310%2C263%2C592%2C224%2C407%2C497%2C201%2C372%2C333%2C210%2C56%2C446%2C571%2C809%2C671%2C568%2C299%2C79%2C449%2C292%2C702%2C234%2C75%2C254%2C339%2C430%2C260%2C403%2C402%2C490%2C600%2C534%2C521%2C187%2C417%2C687%2C748%2C587%2C197%2C574%2C223%2C489%2C536%2C296%2C116%2C211%2C394%2C754%2C523%2C92%2C788%2C270%2C547%2C27%2C30%2C149%2C836%2C71%2C280%2C328%2C289%2C789%2C83%2C236%2C723%2C373%2C544%2C423%2C157%2C156%2C161%2C267%2C122%2C305%2C495%2C136%2C667%2C826%2C388%2C97%2C275%2C692%2C463%2C420%2C205%2C222%2C567%2C15%2C137%2C135&item_cs=FAO&year=1961%2C1962%2C1963%2C1964%2C1965%2C1966%2C1967%2C1968%2C1969%2C1970%2C1971%2C1972%2C1973%2C1974%2C1975%2C1976%2C1977%2C1978%2C1979%2C1980%2C1981%2C1982%2C1983%2C1984%2C1985%2C1986%2C1987%2C1988%2C1989%2C1990%2C1991%2C1992%2C1993%2C1994%2C1995%2C1996%2C1997%2C1998%2C1999%2C2000%2C2001%2C2002%2C2003%2C2004%2C2005%2C2006%2C2007%2C2008%2C2009%2C2010%2C2011%2C2012%2C2013%2C2014%2C2015%2C2016%2C2017%2C2018%2C2019&show_codes=true&show_unit=true&show_flags=true&null_values=false&page_number=1&page_size=100&output_type=objects"
url_2 ="http://fenixservices.fao.org/faostat/api/v1/en/data/QC?area=81&area_cs=FAO&element=2312%2C2510%2C2413&item=800%2C221%2C711%2C515%2C526%2C226%2C366%2C367%2C572%2C203%2C486%2C44%2C782%2C176%2C414%2C558%2C552%2C216%2C181%2C89%2C358%2C101%2C461%2C426%2C217%2C591%2C125%2C378%2C265%2C393%2C108%2C531%2C530%2C220%2C191%2C459%2C689%2C401%2C693%2C698%2C661%2C249%2C656%2C813%2C195%2C554%2C397%2C550%2C577%2C399%2C821%2C569%2C773%2C94%2C512%2C619%2C542%2C541%2C603%2C406%2C720%2C549%2C103%2C507%2C560%2C242%2C839%2C225%2C777%2C336%2C677%2C277%2C780%2C310%2C263%2C592%2C224%2C407%2C497%2C201%2C372%2C333%2C210%2C56%2C446%2C571%2C809%2C671%2C568%2C299%2C79%2C449%2C292%2C702%2C234%2C75%2C254%2C339%2C430%2C260%2C403%2C402%2C490%2C600%2C534%2C521%2C187%2C417%2C687%2C748%2C587%2C197%2C574%2C223%2C489%2C536%2C296%2C116%2C211%2C394%2C754%2C523%2C92%2C788%2C270%2C547%2C27%2C30%2C149%2C836%2C71%2C280%2C328%2C289%2C789%2C83%2C236%2C723%2C373%2C544%2C423%2C157%2C156%2C161%2C267%2C122%2C305%2C495%2C136%2C667%2C826%2C388%2C97%2C275%2C692%2C463%2C420%2C205%2C222%2C567%2C15%2C137%2C135&item_cs=FAO&year=1961%2C1962%2C1963%2C1964%2C1965%2C1966%2C1967%2C1968%2C1969%2C1970%2C1971%2C1972%2C1973%2C1974%2C1975%2C1976%2C1977%2C1978%2C1979%2C1980%2C1981%2C1982%2C1983%2C1984%2C1985%2C1986%2C1987%2C1988%2C1989%2C1990%2C1991%2C1992%2C1993%2C1994%2C1995%2C1996%2C1997%2C1998%2C1999%2C2000%2C2001%2C2002%2C2003%2C2004%2C2005%2C2006%2C2007%2C2008%2C2009%2C2010%2C2011%2C2012%2C2013%2C2014%2C2015%2C2016%2C2017%2C2018%2C2019&show_codes=true&show_unit=true&show_flags=true&null_values=false&page_number=1&page_size=100&output_type=objects" 

def get_data(i):
    payload={}
    headers = {
      'Connection': 'keep-alive',
      'Accept': '*/*',
      'User-Agent': (user-agent inserted here),
      'Origin': 'http://www.fao.org',
      'Referer': 'http://www.fao.org/',
      'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8'
    }
    
    r = requests.get(i, headers=headers)
    return r

all = [url, url_2]

for l in all:
    get_data(l)


# Save the data in a json format
Data = r.json()

# See what we have scraped
Data.keys()

df = pd.json_normalize(Data['data'])

df.info()

I also tried it without return r.我也试过没有返回 r。 I tried other for loops like我尝试了其他 for 循环,例如

for x in range(2):
    get_data(url)
    get_data(url_2)

Problem How could I do a for loop to GET the data from multiple pages on the same webpage from the website?问题我如何做一个for循环来从网站上的同一网页上的多个页面获取数据? The only alternative I see is creating new cells with different links.我看到的唯一选择是创建具有不同链接的新单元格。

You are not getting the data returned by the function, you can achieve that by doing您没有得到 function 返回的数据,您可以通过这样做来实现

data = []
for l in all:
  x = get_data(l).json()
  data.append(x)

or just要不就

data = [get_data(l).json() for l in all]

note: avoid using all as a name for variables as it is a function in the global scope.注意:避免使用all作为变量的名称,因为它是全局 scope 中的 function。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM