[英]How do I pass a user-agent to panda's pd.read_html()?
some websites automatically decline requests due to lack of user-agent, and it's a hassle using bs4 to scrape many different types of tables.有些网站由于缺少用户代理而自动拒绝请求,使用 bs4 抓取许多不同类型的表很麻烦。
This issue was resolved before through this code:此问题之前已通过此代码解决:
url = 'http://finance.yahoo.com/quote/A/key-statistics?p=A'
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
response = opener.open(url)
tables = pd.read_html(response.read()
However urllib2 has been depreciated and urllib3 doesn't have a build_opener() attribute, and I could not find an equivalent attribute either even though I'm sure it has one.但是 urllib2 已经贬值并且 urllib3 没有 build_opener() 属性,而且我也找不到等效的属性,即使我确定它有一个。
read_html()
accepts a URL and string, so u can set headers on request, and pandas ll read this resoponse like a text: read_html()
接受 URL 和字符串,因此您可以根据要求设置标头,而 pandas 将像文本一样阅读此响应:
import pandas as pd
import requests
url = 'http://finance.yahoo.com/quote/A/key-statistics?p=A'
response = requests.get(url, headers={'User-agent': 'Mozilla/5.0'})
tables = pd.read_html(response.text)
print(tables)
If u open read_html()
none of the options accept headers as an argument, so just set headers in request如果你打开read_html()
没有任何选项接受标题作为参数,所以只需在请求中设置标题
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.