简体   繁体   中英

Web Scraping HTML page using PowerShell - Get latest release notes Version number from adobe

I am trying to create a script using powershell to track Adobe Reader latest releases. Using the URL link lists the latest releases. I did not get very far using powershell web scraping.

$t = "https://www.adobe.com/devnet-docs/acrobatetk/tools/ReleaseNotesDC/"

$r = Invoke-WebRequest -uri $t 
$r.ParsedHtml.body.getElementsByTagName('Div')

You could target the raw HTML code with regular expressions to extract the update information. One way would be to split on the <li> and iterate using a switch statement.

$baseuri = 'https://www.adobe.com/devnet-docs/acrobatetk/tools/ReleaseNotesDC'

$response = Invoke-WebRequest -Uri $baseuri -UseBasicParsing

$updatelist = switch -Regex ($response.Content -split '<li>'){
    'href="(?<URL>.+?)".+?(?<Version>\d{2}\.\d+?\.[\d\w]+?) (?<Type>[\w\s]+?), (?<Date>\w+? \d+, \d+)' {
        [PSCustomObject]@{
            Version = $matches.Version
            Type    = $matches.Type
            Date    = $matches.Date
            URL     = "$baseuri/{0}" -f $matches.Url
        }
    }

    'href="(?<URL>.+?)".+?(?<Win>\d{2}\.\d+?\.[\d\w]+? \(Win\)), (?<Mac>\d{2}\.\d+?\.[\d\w]+? \(Mac\)) (?<Type>[\w\s]+?), (?<Date>\w+? \d+, \d+)' {
        $ht = [ordered]@{
            Version = $matches.win
            Type    = $matches.Type
            Date    = $matches.Date
            URL     = "$baseuri/{0}" -f $matches.Url
        }

        [PSCustomObject]$ht

        $ht.Version = $matches.mac
    
        [PSCustomObject]$ht
    }
}

The list will be stored in the variable $updatelist

first regex pattern demo

second regex pattern demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM