[英]python loop requests.get() only returns first loop
Trying to scrape a table from multiple webpages and store in a list. 尝试从多个网页中抓取一个表格并将其存储在列表中。 The list prints out the results from the first webpage 3 times.
该列表将第一个网页的结果打印3次。
import pandas as pd
import requests
from bs4 import BeautifulSoup
dflist = []
for i in range(1,4):
s = requests.Session()
res = requests.get(r'http://www.ironman.com/triathlon/events/americas/ironman/world-championship/results.aspx?p=' + str(i) + 'race=worldchampionship&rd=20181013&agegroup=Pro&sex=M&y=2018&ps=20#axzz5VRWzxmt3')
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')
dfs = pd.read_html(str(table))
dflist.append(dfs)
s.close()
print(dflist)
You left out the &
after '?p=' + str(i)
, so your requests all have p
set to ${NUMBER}race=worldchampionship
, which ironman.com presumably can't make sense of and just ignores. 您在
'?p=' + str(i)
之后省略了&
,因此您的请求都将p
设置为${NUMBER}race=worldchampionship
,Ironman.com可能无法理解而只是忽略了它。 Insert a &
at the beginning of 'race=worldchampionship'
. 在
'race=worldchampionship'
的开头插入&
。
To prevent this sort of mistake in the future, you can pass the URL's query parameters as a dict
to the params
keyword argument like so: 为了避免将来发生这种错误,您可以将URL的查询参数作为
dict
传递给params
关键字参数,如下所示:
params = {
"p": i,
"race": "worldchampionship",
"rd": "20181013",
"agegroup": "Pro",
"sex": "M",
"y": "2018",
"ps": "20",
}
res = requests.get(r'http://www.ironman.com/triathlon/events/americas/ironman/world-championship/results.aspx#axzz5VRWzxmt3', params=params)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.