简体   繁体   English

Selenium:如何从网站中提取所有图像(包括来自 javascript 和 css 的图像)

[英]Selenium: how to extract all images from a website (including ones from javascript and css)

I need to extract all images from a website using Selenium .我需要使用Selenium从网站中提取所有图像。 This should include all images of any extension ( png , jpg , svg , etc) from html, css and javascript.这应该包括来自pngjpg svg和 ZDE9B9ED708D7E9119DCEE.FF9ED708D7E911E78 This means that a simple extraction of all the <img> elements will not be sufficient (eg any image loaded from css style will be missed):这意味着简单地提取所有<img>元素是不够的(例如,任何从 css 样式加载的图像都将丢失):

images = driver.find_elements_by_tag_name('img')  # not sufficient

Is there anything smarter to do instead of downloading and parsing every css and javascript script required in the website and using regex to look for image files?有什么比下载和解析网站所需的每个 css 和 javascript 脚本并使用正则表达式查找图像文件更聪明的方法吗?

It would be ideal if there is a way to just look for the downloaded resources after the page load, something similar to the network tab in chrome dev tools :如果有一种方法可以在页面加载后查找下载的资源,这将是理想的,类似于chrome dev tools中的network选项卡:

在此处输入图像描述

Any idea?任何想法?

The answer is originally taken from How to access Network panel on google chrome developer tools with selenium?答案最初取自How to access Network panel on google chrome developer tools with selenium? . . I just updated a little bit.我刚刚更新了一点。

resources = driver.execute_script("return window.performance.getEntriesByType('resource');")                                                  
for resource in resources: 
    if resource['initiatorType'] == 'img': # check for other types if needed
        print(resource['name']) # this is the original link of the file

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM