简体   繁体   English

使用Python从网站图表中收集数据

[英]Scraping data from website graph with Python

As an exercise, I am trying to scrape data from a dynamic graph using Python. 作为练习,我正在尝试使用Python从动态图中抓取数据。 The graph can be found at this link (let's say I want the data from the first one). 该图可以在此链接上找到(假设我想要第一个数据)。

Now, I was thinking of doing something like: 现在,我正在考虑做类似的事情:

src = 'https://marketchameleon.com/Overview/WFT/IV/#_ABSTRACT_RENDERER_ID_11'

import json
import urllib.request

with urllib.request.urlopen(src) as url:
    data = url.read()
    reply = json.loads(data)

However, I receive an error message on the last line of the code, saying: 但是,我在代码的最后一行收到一条错误消息,内容为:

JSONDecodeError: Expecting value

"data" is not empty, so I believe there is a problem with the format of the information within it. “数据”不是空的,因此我认为其中的信息格式存在问题。 Does someone have an idea to solve this issue? 有人有解决此问题的想法吗? Thanks! 谢谢!

I opened that link and see that the site loads data from another URL - https://marketchameleon.com/charts/histStockChartData?p=747&m=12&_=1534060722519 我打开了该链接,看到该站点从另一个URL加载了数据-https: //marketchameleon.com/charts/histStockChartData?p=747&m=12& _ =1534060722519

You can use json.loads() function twice and do some hacks with headers ( urllib2.Request is your friend in case of Python 2) since server returns HTTP 500 when you don't imitate browser 您可以使用json.loads()函数两次,并使用标头( urllib2.Request在Python 2的情况下是您的朋友进行一些破解,因为当您不模仿浏览器时服务器会返回HTTP 500

src = 'https://marketchameleon.com/charts/histStockChartData?p=747&m=12'

import json
import urllib.request

user_agent = {
    'Host': 'marketchameleon.com',
    'Connection': 'keep-alive',
    'Pragma': 'no-cache',
    'Cache-Control': 'no-cache',
    'Upgrade-Insecure-Requests': 1,
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'Accept-Language': 'ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7,kk;q=0.6'
}
request = urllib.request.Request(src, headers=user_agent)

data = urllib.request.urlopen(request).read()
print(data)
reply = json.loads(data)

table = json.loads(reply['GTable'])
print(table)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM