简体   繁体   中英

Grab image links from HTML website using powershell

I'd like to download some image galleries in bulk. The images are offered up for free with no permissions needed. I for the life of me cannot get it to work. This is what I have so far. The $pattern spit out is the whole HTML line, not just the image link. Is there any pointers you can give me? The loop is set to only run once for testing purposes. The loop, will go through all pages which are organized numerically.

# Variables
$i=1        # Webpage Counter
$j=1        # Image Counter
$rootDir = "http://website.com/sport/galleries/"
$saveDir = "C:\Users\user\Desktop\"
$webpagetxt = "C:\Users\user\Desktop\page.txt"
$links = "C:\Users\user\Desktop\links.txt"
$regex = "http://website.com/galleries/[0-9]*/[^\.]*.JPG"

# Create folder to download to
#New-Item -Name SiouxSportsGalleries -ItemType directory

# Start Web Client
$client = New-Object System.Net.WebClient

# Main loop to get image links and download
    For($i=10; $i -le 10; $i++){

        # Download source code of the web page.
        $url = $rootDir+$i+'.htm'
        $webclient = new-object System.Net.WebClient
        $webpage = $webclient.DownloadString($url)
        $webpage > "$webpagetxt"

    # Parse web page and find image link.
       $pattern = Get-Content $webpagetxt | Select-String -pattern $regex -Allmatches
       echo "This is the link" $pattern
    #$pattern > $links

 }

You need to extract value that was a match. Select-String returns objects, and when you echo it, what happends is $pattern.ToString() . ToString() returns the line, and not the match-value. This will return all the links only:

Get-Content $webpagetxt | Select-String -pattern $regex -Allmatches | % { $_.Matches | % { $_.Value } }

Btw, instead of saving the webpage and reopen it with get-content , you can simply split the string on linebreaks to get an array(if that's was the only reason you saved it). :-)

$webpage -split "`n" | Select-String -pattern $regex -Allmatches | % { $_.Matches | % { $_.Value } }

EDIT To download it, you could just extend it with another foreach-loop:

$rootDir = "http://website.com/sport/galleries/"
$saveDir = "C:\Users\user\Desktop\"
$webpage -split "`n" | Select-String -pattern $regex -Allmatches | % { $_.Matches | % { $_.Value } } | % {
    #Get local path
    $local = $_.Replace($rootDir, $saveDir)
    #Create path
    $file = New-Item $local -ItemType file -Force
    #Download
    $wb.DownloadFile($_, $file.FullName)
}

Select-String returns you an object with properties. Send it to Get-Member to see what goodies you have. You'll want to check out the matches property eg $pattern.matches . Check out example 9 in the documentation .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM