简体   繁体   中英

Powershell method of downloading file from a website with a changing URL?

I have been given a task that involves downloading a single file every day from a website. Let's call it " https://test.example.com ". I have credentials that allow me to login to the site, where a Flash interface then presents the files that are available for download. After the file is downloaded, it is then processed in a variety of ways. I have already put together the Powershell that handles all that, I am just having a hard time with automating the actual download of the file.

I used the Flash interface to download a few files while watching the network activity, and found that it is actually pulling the file from this URL:

https://test.example.com/link/EBDB7F67EF3B28XX99NCAD9920160423/file.zip

Therefore, I was able to put this together in order to automatically get the file via my PS script:

$url = 'https://test.example.com/link/EBDB7F67EF3B28XX99NCAD9920160423/file.zip'
$output = "C:\Downloads\file.zip"

Invoke-WebRequest -Uri $url -OutFile $output  

However, the long string of numbers in the URL changes every day. The only discernible pattern I can find is that the last eight digits are always the date on which that particular file is posted.

Is there a good way to approach this? I've been experimenting with wildcards and patterns, as well as checking the HTML for elements that I can filter, but I am having a hard time finding the correct solution.

This is very hard to automate. You can't drive Flash from the script unless it is specifically designed for that. As I see it now your only options are:

  1. Contact site devs if possible, maybe they can give you a details on function that generates link. This gives me an idea - perhaps you can reverse engineer Flash code to find that function details yourself. Use flash decompiler for this.
  2. Simulate the user browsing the flash site. This can be done in one of the following ways:
    • Autohotkey - you can record mouse clicking relative to the browser window and execute the script again. Unless flash interface is too dynamic and unpredictive it will work.
    • Sikuli - another automation language which relies on picture segment recognition.

All above 2.* methods produce fragile automation code as they depend on browser settings (zoom, theme) and even OS settings. For this reason you need to dedicate one machine for that in all probability (virtual machine ofc). Decompiling flash code and re-implementing the url generting code in powershell will make it a reliable 100%.

As somebody said in comments this is not a powershell queestion but browser automation question.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM