简体   繁体   中英

How do I scrape full-sized images from a website?

I am trying to obtain clinical images of psoriasis patients from these two websites for research purposes:

http://www.dermis.net/dermisroot/en/31346/diagnose.htm

http://dermatlas.med.jhmi.edu/derm/

For the first site, I tried just saving the page with firefox, but it only saved the thumbnails and not the full-sized images. I was able to access the full-sized images using a firefox addon called "downloadthemall", but it saved each image as part of a new html page and I do not know of any way to extract just the images.

I also tried getting on one of my university's linux machines and using wget to mirror the websites, but I was not able to get it to work and am still unsure as to why.

Consequently, I am wondering whether it would be easy to write a short script (or whatever method is easiest) to (a) obtain the full-sized images linked to on the first website, and (b) obtain all full-sized images on the second site with "psoriasis" in the filename.

I have been programming for a couple of years, but have zero experience with web development and would appreciate any advice on how to go about doing this.

Why not use wget to recursively download images from the domain? Here is an example:

wget -r -P /save/location -A jpeg,jpg,bmp,gif,png http://www.domain.com

Here is the man page: http://www.gnu.org/software/wget/manual/wget.html

Try HTTrack website copier - it will load all the images on the website. You can also try http://htmlparser.sourceforge.net/ . It will grab website as well with resources if you specify it in org.htmlparser.parserapplications.SiteCapturer

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM