[英]Pandas read_html gives me Permission denied(403)
I signed up to get currency price via a provider. 我注册以通过提供商获得货币价格。 When I use pd.read_html('URL') I get 403 error - permission denied. 当我使用pd.read_html('URL')时,出现403错误-权限被拒绝。 So I then tried to emulate a browser by doing this- 因此,我然后尝试通过这样做来模仿浏览器-
import pandas as pd
import matplotlib.pyplot as plt
import html5lib
import requests
%matplotlib inline
### Pretend to be a browser ###
url = 'URL_TO_PROVIDER_WITH_TOKEN'
header = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36","X-Requested-With": "XMLHttpRequest"}
r = requests.get(url, headers=header)
currency = pd.read_html('r')
However this gives me "no tables found". 但是,这给了我“找不到表”。 The source looks like this - 来源看起来像这样-
{"status":true,"currency":[{"currency":"GBP\/CAD","value":"1.7136","date":"2019-01-18 17:19:58","type":"original"}]}
What do I do wrong? 我做错了什么?
If there are no tables in source file, how then can I get data into Pandas? 如果源文件中没有表,那么如何将数据导入Pandas? As you can see the data I would like to "parse" looks like this (json) 如您所见,我要“解析”的数据如下所示(json)
{"status":true,"currency":[{"currency":"GBP\/CAD","value":"1.7136","date":"2019-01-18 17:19:58","type":"original"}]}
Ok obvious the source was not html and had no tables in it. 好的,显然源不是html,也没有表格。 Therefor JSON was the way to go. 因此,JSON是必经之路。 I managed to save the JSON structure with 我设法用保存JSON结构
r = requests.get(url, headers=header).json()
But then I am stuck. 但是后来我被困住了。 Output of r looks like this - r的输出看起来像这样-
{'status': True,
'currency': [{'currency': 'GBP/CAD',
'value': '1.7083',
'date': '2019-01-18 22:59:58',
'type': 'original'}]}
How do I get columns to dataframe? 如何获取数据框的列? I want 'currency': 'GBP/CAD', 'value': '1.7083' and 'date': '2019-01-18 22:59:58' 我想要'currency':'GBP / CAD','value':'1.7083'and'date':'2019-01-18 22:59:58'
url = 'URL_API_TOKEN'
header = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36","X-Requested-With": "XMLHttpRequest"}
r = requests.get(url, headers=header).json()
data = json_normalize(r['currency'])
Try to use : currency = pd.read_html(r) 尝试使用:currency = pd.read_html(r)
instead of : currency = pd.read_html('r') 而不是:currency = pd.read_html('r')
because you call the method read_html with the string "r" as argument and not the variable r 因为您使用字符串“ r”作为参数而不是变量r来调用read_html方法
SLP SLP
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.