[英]Scraping data from website graph with Python
As an exercise, I am trying to scrape data from a dynamic graph using Python. 作为练习,我正在尝试使用Python从动态图中抓取数据。 The graph can be found at this link (let's say I want the data from the first one).
该图可以在此链接上找到(假设我想要第一个数据)。
Now, I was thinking of doing something like: 现在,我正在考虑做类似的事情:
src = 'https://marketchameleon.com/Overview/WFT/IV/#_ABSTRACT_RENDERER_ID_11'
import json
import urllib.request
with urllib.request.urlopen(src) as url:
data = url.read()
reply = json.loads(data)
However, I receive an error message on the last line of the code, saying: 但是,我在代码的最后一行收到一条错误消息,内容为:
JSONDecodeError: Expecting value
"data" is not empty, so I believe there is a problem with the format of the information within it. “数据”不是空的,因此我认为其中的信息格式存在问题。 Does someone have an idea to solve this issue?
有人有解决此问题的想法吗? Thanks!
谢谢!
I opened that link and see that the site loads data from another URL - https://marketchameleon.com/charts/histStockChartData?p=747&m=12&_=1534060722519 我打开了该链接,看到该站点从另一个URL加载了数据-https: //marketchameleon.com/charts/histStockChartData?p=747&m=12& _ =1534060722519
You can use json.loads() function twice and do some hacks with headers ( urllib2.Request is your friend in case of Python 2) since server returns HTTP 500 when you don't imitate browser 您可以使用json.loads()函数两次,并使用标头( urllib2.Request在Python 2的情况下是您的朋友)进行一些破解,因为当您不模仿浏览器时服务器会返回HTTP 500
src = 'https://marketchameleon.com/charts/histStockChartData?p=747&m=12'
import json
import urllib.request
user_agent = {
'Host': 'marketchameleon.com',
'Connection': 'keep-alive',
'Pragma': 'no-cache',
'Cache-Control': 'no-cache',
'Upgrade-Insecure-Requests': 1,
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Language': 'ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7,kk;q=0.6'
}
request = urllib.request.Request(src, headers=user_agent)
data = urllib.request.urlopen(request).read()
print(data)
reply = json.loads(data)
table = json.loads(reply['GTable'])
print(table)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.