简体   繁体   English

如何从 JavaScript 呈现的响应页面下载最高分辨率的图像?

[英]How do I download the highest resolution image from a JavaScript rendered responsive page?

Suppose this is the website page: " https://www.dior.com/en_us/products/couture-943C105A4655_C679-technical-fabric-cargo-pants-covered-in-tulle ", from which I want to download all the images of the product showcased (4 images in this case).假设这是网站页面:“ https://www.dior.com/en_us/products/couture-943C105A4655_C679-technical-fabric-cargo-pants-covered-in-tulle ”,我想从中下载所有图片展示的产品(本例中为 4 张图片)。

I am using Selenium and extracting image links.我正在使用 Selenium 并提取图像链接。 The problem is if I click the images they are even 2000x3000 pixels big, but I am only able to get 480 around pixels resolution images of them.问题是,如果我单击它们甚至 2000x3000 像素大的图像,但我只能获得 480 像素分辨率的图像。 Where are these images stored?这些图像存储在哪里? How do I extract them?我如何提取它们? ( basically I want to download the maximum possible size of those images ) (基本上我想下载这些图像的最大可能尺寸)

Withing the source code of the page you provided, there is json data that provides the links and content for the page.使用您提供的页面源代码,有 json 数据提供页面的链接和内容。 Once the data is stripped from the script in the source code, it is easy to retrieve the high resolution links and download the image.一旦从源代码中的脚本中剥离数据,就很容易检索高分辨率链接并下载图像。 If you have not already, pip install requests and pip install bs4 .如果您还没有, pip install requestspip install bs4

import requests, re, json
from bs4 import BeautifulSoup

url = 'https://www.dior.com/en_us/products/couture-943C105A4655_C679-technical-fabric-cargo-pants-covered-in-tulle'

r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
script = [script.text for script in soup.find_all('script') if 'window.initialState' in script.text][0]
json_data_s = re.search(r'{.+}', script).group(0)
json_data = json.loads(json_data_s)
for holder in json_data['CONTENT']['cmsContent']['elements']:
    if holder.get('type') == 'PRODUCTMEDIAS':
        for image in holder['items']:
            name = image['galleryImages']['imageZoom']['viewCode']
            img_src = image['galleryImages']['imageZoom']['uri']
            image_page = requests.get(img_src)
            with open(name + '.jpg', 'wb') as img:
                img.write(image_page.content)

*The images you were downloading before were the thumbnail photos. *您之前下载的图像是缩略图。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM