I'm trying to scrape stockcharts.com for the chart image from a url. For example from: http://stockcharts.com/h-sc/ui?s=AMZN
however, when inspecting the element in question, it is not a proper image src with a .jpg, .png, etc. suffix. For example the element in question from the above link is: http://stockcharts.com/c-sc/sc?s=AMZN&p=D&b=5&g=0&i=0&r=1479451634864
and therefore when I try to use the following code in python 2.7, I get an empty file in the directory sharing the script:
import urllib
url = "http://stockcharts.com/c-sc/sc?s=AMZN&p=D&b=5&g=0&i=0&r=1479451634864"
filename = "testimg.jpg"
urllib.urlretrieve(url, filename)
Is this a javascript rendered page, or is there something that I'm missing? A reference to elsewhere?
The site checks User-Agent
header; It allows specific user-agents only.
You need to change the header to fetch the image. Otherwise, the site returns a 403 Forbidden response.
urllib.urlretrieve
does not accept additional headers, you need to use urllib2.urlopen
/ urllib2.Request
to specify custom headers and save file yourself:
import urllib2
url = "http://stockcharts.com/c-sc/sc?s=AMZN&p=D&b=5&g=0&i=0&r=1479451634864"
filename = "sc.png"
req = urllib2.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
u = urllib2.urlopen(req)
with open(filename, 'wb') as f:
f.write(u.read())
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.