When scraping image url src, get data:image/jpeg;base64

Question

I was trying to scrape the image url from a website using python urllib2.

Here is my code to get the html string:

req = urllib2.Request(url, headers = urllib2Header)
htmlStr = urllib2.urlopen(req, timeout=15).read()

When I view from the browser, the html code of the image looks like this:

<img id="main-image" src="http://abcd.com/images/41Q2VRKA2QL._SY300_.jpg" alt="" rel="" style="display: inline; cursor: pointer;">

However, when I read from the htmlStr I captured, the image was converted to base64 image, which looks like this:

<img id="main-image" src="data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAUDBAQEAwUEBAQFBQU....">

I am wondering why this happened. Is there a way to get the original image url rather than the base64 image string?

Thanks.

Answer 1

you could use BeautifulSoup

Example:

import urllib2
from bs4 import BeautifulSoup

url = "www.theurlyouwanttoscrape.com"
html = urllib2.urlopen(url)

soup = BeautifulSoup(html)

img_src = soup.find('img', {'id':'main_image'})['src']

When scraping image url src, get data:image/jpeg;base64

Question

1 answers

solution1
0 ACCPTED 2014-03-12 00:27:37

When scraping image url src, get data:image/jpeg;base64

Question

1 answers

solution1 0 ACCPTED 2014-03-12 00:27:37

solution1
0 ACCPTED 2014-03-12 00:27:37