简体   繁体   English

熊猫read_html给我的权限被拒绝(403)

[英]Pandas read_html gives me Permission denied(403)

I signed up to get currency price via a provider. 我注册以通过提供商获得货币价格。 When I use pd.read_html('URL') I get 403 error - permission denied. 当我使用pd.read_html('URL')时,出现403错误-权限被拒绝。 So I then tried to emulate a browser by doing this- 因此,我然后尝试通过这样做来模仿浏览器-

import pandas as pd
import matplotlib.pyplot as plt
import html5lib
import requests
%matplotlib inline

### Pretend to be a browser ###
url = 'URL_TO_PROVIDER_WITH_TOKEN'
header = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36","X-Requested-With": "XMLHttpRequest"}

r = requests.get(url, headers=header)

currency = pd.read_html('r')

However this gives me "no tables found". 但是,这给了我“找不到表”。 The source looks like this - 来源看起来像这样-

{"status":true,"currency":[{"currency":"GBP\/CAD","value":"1.7136","date":"2019-01-18 17:19:58","type":"original"}]}

What do I do wrong? 我做错了什么?

EDIT 编辑

If there are no tables in source file, how then can I get data into Pandas? 如果源文件中没有表,那么如何将数据导入Pandas? As you can see the data I would like to "parse" looks like this (json) 如您所见,我要“解析”的数据如下所示(json)

{"status":true,"currency":[{"currency":"GBP\/CAD","value":"1.7136","date":"2019-01-18 17:19:58","type":"original"}]}

EDIT 编辑

Ok obvious the source was not html and had no tables in it. 好的,显然源不是html,也没有表格。 Therefor JSON was the way to go. 因此,JSON是必经之路。 I managed to save the JSON structure with 我设法用保存JSON结构

r = requests.get(url, headers=header).json()

But then I am stuck. 但是后来我被困住了。 Output of r looks like this - r的输出看起来像这样-

{'status': True,
 'currency': [{'currency': 'GBP/CAD',
   'value': '1.7083',
   'date': '2019-01-18 22:59:58',
   'type': 'original'}]}

How do I get columns to dataframe? 如何获取数据框的列? I want 'currency': 'GBP/CAD', 'value': '1.7083' and 'date': '2019-01-18 22:59:58' 我想要'currency':'GBP / CAD','value':'1.7083'and'date':'2019-01-18 22:59:58'

EDIT - SOLUTION 编辑-解决方案

url = 'URL_API_TOKEN'
header = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36","X-Requested-With": "XMLHttpRequest"}

r = requests.get(url, headers=header).json()

data = json_normalize(r['currency'])

Try to use : currency = pd.read_html(r) 尝试使用:currency = pd.read_html(r)

instead of : currency = pd.read_html('r') 而不是:currency = pd.read_html('r')

because you call the method read_html with the string "r" as argument and not the variable r 因为您使用字符串“ r”作为参数而不是变量r来调用read_html方法

SLP SLP

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM