简体   繁体   中英

Download images from website

I want to have a local copy of a gallery on a website. The gallery shows the pictures at domain.com/id/1 (id increases in increments of 1) and then the image is stored at pics.domain.com/pics/original/image.format. The exact line that the images have in the HTML are

<div id="bigwall" class="right"> 
    <img border=0 src='http://pics.domain.com/pics/original/image.jpg' name='pic' alt='' style='top: 0px; left: 0px; margin-top: 50px; height: 85%;'> 
</div>

So I want to write a script that does something like this (in pseudo-code):

for(id = 1; id <= 151468; id++) {
     page = "http://domain.com/id/" + id.toString();
     src = returnSrc(); // Searches the html for img with name='pic' and saves the image location as a string
     getImg(); // Downloads the file named in src
}

I'm not sure exactly how to do this though. I suppose I could do it in bash, using wget to download the html and then search the html manually for http://pics.domain.com/pics/original/ . then use wget again to save the file, remove the html file, increment the id and repeat. The only thing is I'm not good at handling strings, so if anyone could tell me how to search for the url and replace the *s with the file name and format I should be able to get the rest going. Or if my method is stupid and you have a better one please share.

# get all pages
curl 'http://domain.com/id/[1-151468]' -o '#1.html'

# get all images
grep -oh 'http://pics.domain.com/pics/original/.*jpg' *.html >urls.txt

# download all images
sort -u urls.txt | wget -i-

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM