使用python抓取图像但找不到图像

Question

I'm trying to scrape stockcharts.com for the chart image from a url. 我正在尝试从URL刮取stockcharts.com上的图表图像。 For example from: http://stockcharts.com/h-sc/ui?s=AMZN 例如，来自： http : //stockcharts.com/h-sc/ui?s=AMZN

however, when inspecting the element in question, it is not a proper image src with a .jpg, .png, etc. suffix. 但是，当检查有问题的元素时，它不是带有.jpg，.png等后缀的正确图像src。 For example the element in question from the above link is: http://stockcharts.com/c-sc/sc?s=AMZN&p=D&b=5&g=0&i=0&r=1479451634864 例如，上述链接中的相关元素为： http : //stockcharts.com/c-sc/sc?s=AMZN&p=D&b=5&g=0&i=0&r=1479451634864

and therefore when I try to use the following code in python 2.7, I get an empty file in the directory sharing the script: 因此，当我尝试在python 2.7中使用以下代码时，在共享脚本的目录中得到一个空文件：

import urllib
url = "http://stockcharts.com/c-sc/sc?s=AMZN&p=D&b=5&g=0&i=0&r=1479451634864"
filename = "testimg.jpg"
urllib.urlretrieve(url, filename)

Is this a javascript rendered page, or is there something that I'm missing? 这是JavaScript呈现的页面，还是我缺少什么？ A reference to elsewhere? 引用其他地方？

Answer 1

The site checks User-Agent header; 该站点检查User-Agent标头； It allows specific user-agents only. 它仅允许特定的用户代理。

You need to change the header to fetch the image. 您需要更改标题以获取图像。 Otherwise, the site returns a 403 Forbidden response. 否则，站点将返回403禁止响应。

urllib.urlretrieve does not accept additional headers, you need to use urllib2.urlopen / urllib2.Request to specify custom headers and save file yourself: urllib.urlretrieve不接受其他标头，您需要使用urllib2.urlopen / urllib2.Request指定自定义标头并自己保存文件：

import urllib2

url = "http://stockcharts.com/c-sc/sc?s=AMZN&p=D&b=5&g=0&i=0&r=1479451634864"
filename = "sc.png"
req = urllib2.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
u = urllib2.urlopen(req)
with open(filename, 'wb') as f:
    f.write(u.read())

使用python抓取图像但找不到图像

问题描述

1 个解决方案

解决方案1
-1 2016-11-19 07:03:44

使用python抓取图像但找不到图像

问题描述

1 个解决方案

解决方案1 -1 2016-11-19 07:03:44

解决方案1
-1 2016-11-19 07:03:44