简体   繁体   中英

Selenium: how to extract all images from a website (including ones from javascript and css)

I need to extract all images from a website using Selenium . This should include all images of any extension ( png , jpg , svg , etc) from html, css and javascript. This means that a simple extraction of all the <img> elements will not be sufficient (eg any image loaded from css style will be missed):

images = driver.find_elements_by_tag_name('img')  # not sufficient

Is there anything smarter to do instead of downloading and parsing every css and javascript script required in the website and using regex to look for image files?

It would be ideal if there is a way to just look for the downloaded resources after the page load, something similar to the network tab in chrome dev tools :

在此处输入图像描述

Any idea?

The answer is originally taken from How to access Network panel on google chrome developer tools with selenium? . I just updated a little bit.

resources = driver.execute_script("return window.performance.getEntriesByType('resource');")                                                  
for resource in resources: 
    if resource['initiatorType'] == 'img': # check for other types if needed
        print(resource['name']) # this is the original link of the file

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM