i am trying to download pictures from a site. I figured out that the problem why i cant find the picture URL is immediatelly in the beginning of the code.
I have a problem with that urlopen is downloading a diffrerent HTML than i get in browser.
The site is here . When i look at HTML in browser, i can see this part:
<a href="#" data-trigger="cmg-rotate-big">
<img src="/image/product/eca412b9-9484-4046-8bee-8400fde1d5fe/?width=400" alt="" data-cm-index="0" style="width: 400px; height: 400px; margin-left: 0px; opacity: 1;">
<img src="/image/product/014a128e-fa7b-4817-9d76-7bdf296de8de/?width=400" alt="" data-cm-index="1" style="width: 0px; height: 400px; margin-left: 200px; opacity: 0.5;">
</a>
But by the code
text = urllib2.urlopen(url).read()
soup = BeautifulSoup(text, "html.parser")
print(soup)
the same part is only
<a data-trigger="cmg-rotate-big" href="#">
<img alt="" data-cm-index="0" src=""/>
<img alt="" data-cm-index="1" src=""/>
</a>
So i can extract the SRC of the image because its missing.. where is the problem please?
Thank you!
The src href is in there. No need to simulate javascript.
import requests
import bs4
url = 'https://ceskamincovna.cz/stribrna-mince-na-kolech---skoda-felicia-proof-1493-11549-d/'
response = requests.get(url)
soup = bs4.BeautifulSoup(response.text , 'html.parser')
imgs = soup.find_all('img')
for img in imgs:
if '/image/product/' in img['src']:
print (img['src'])
Output:
/image/product/eca412b9-9484-4046-8bee-8400fde1d5fe/?width=250
/image/product/014a128e-fa7b-4817-9d76-7bdf296de8de/?width=250
/image/product/0ec5b392-0f8a-4013-a448-a1b82578c008/?width=250
/image/product/9bc26462-5f11-4994-be6e-fcde1d97c5f3/?width=250
/image/product/7da1f235-f322-4a57-b0ca-07964f0a7d37/?width=250
/image/product/bd781b17-8482-4a4f-80f3-5fa55b9bc4c1/?width=250
/image/product/f5d4ade9-cac0-4c15-a935-da125b408da1/?width=250
/image/product/f4d6fb41-af72-4510-a70c-0a9893656e93/?width=250
/image/product/6136afe7-7444-42cd-858b-af66ca4ca6de/?width=140
/image/product/a459eb25-dd12-446a-9517-341d128c9571/?width=140
If you want the width = 400:
import requests
import bs4
url = 'https://ceskamincovna.cz/stribrna-mince-na-kolech---skoda-felicia-proof-1493-11549-d/'
response = requests.get(url)
soup = bs4.BeautifulSoup(response.text , 'html.parser')
imgs = soup.find_all('img')
for img in imgs:
if '/image/product/' in img['src']:
print (img['src'].split('?width=')[0] + '?width=400')
Output:
/image/product/eca412b9-9484-4046-8bee-8400fde1d5fe/?width=400
/image/product/014a128e-fa7b-4817-9d76-7bdf296de8de/?width=400
/image/product/0ec5b392-0f8a-4013-a448-a1b82578c008/?width=400
/image/product/9bc26462-5f11-4994-be6e-fcde1d97c5f3/?width=400
/image/product/7da1f235-f322-4a57-b0ca-07964f0a7d37/?width=400
/image/product/bd781b17-8482-4a4f-80f3-5fa55b9bc4c1/?width=400
/image/product/f5d4ade9-cac0-4c15-a935-da125b408da1/?width=400
/image/product/f4d6fb41-af72-4510-a70c-0a9893656e93/?width=400
/image/product/6136afe7-7444-42cd-858b-af66ca4ca6de/?width=400
/image/product/a459eb25-dd12-446a-9517-341d128c9571/?width=400
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.