简体   繁体   中英

PowerShell Download Txt File from WebSite

I am facing an issue in downloading the txt file from a website. The script below downloads the http code instead of the actual txt file and its contents.

$WebClient = New-Object System.Net.WebClient $WebClient.DownloadFile("https://thegivebackproject.org/CheckStatus.txt", "D:\CheckStatus.txt")

Short Answer

The server is doing browser sniffing to send different responses based on the User-Agent header in your request. You can get the response you want by sending a canned user agent string:

$useragent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36"
Invoke-WebRequest -URI https://thegivebackproject.org/CheckStatus.txt -OutFile c:\temp\CheckStatus.txt -UserAgent $useragent

Long Answer

The server responding to the url you're hitting is doing browser sniffing to decide what content to return. If you give it a User-Agent header that it recognises it will return the response you're expecting (ie the literal text "Azeemkhan-WaseemRaza").

If you don't include a User-Agent header (and $WebClient.DownloadFile doesn't include one), the server is responding with a html page instead.

You can see this behaviour yourself if you install a HTTP trace tool like Fiddler . When you hit the page in a browser you see this HTTP request and response pair:

request

GET https://thegivebackproject.org/CheckStatus.txt HTTP/1.1
Host: thegivebackproject.org
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36
Sec-Fetch-User: ?1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3
Sec-Fetch-Site: none
Sec-Fetch-Mode: navigate
Accept-Encoding: gzip, deflate, br
Accept-Language: en-GB,en-US;q=0.9,en;q=0.8
Cookie: SPSI=ee952ba44e33e958f963807ede78624b

response

HTTP/1.1 200 OK
Server: nginx
Date: Tue, 12 Nov 2019 08:13:57 GMT
Content-Type: text/plain
Content-Length: 20
Connection: keep-alive
Last-Modified: Thu, 07 Nov 2019 16:15:48 GMT
Accept-Ranges: bytes
X-Cache: MISS

Azeemkhan-WaseemRaza

but when you use $WebClient.DownloadFile you see this instead:

request

GET https://thegivebackproject.org/CheckStatus.txt HTTP/1.1
Host: thegivebackproject.org

response

HTTP/1.1 200 OK
Server: nginx
Date: Tue, 12 Nov 2019 08:14:21 GMT
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
Set-Cookie: SPSI=9c24f8993046ef610e25cc727c4a4ae2; Path=/
Set-Cookie: adOtr=obsvl; Expires=Thu, 2 Aug 2001 20:47:11 UTC; Path=/
Set-Cookie: UTGv2=D-h4d40f620bfdd6c3b77b035ee99f96621134; Expires=Wed, 11-Nov-20 08:14:21 GMT; Path=/
cache-control: no-store, no-cache, max-age=0, must-revalidate, private,  max-stale=0, post-check=0, pre-check=0
Vary: Accept-Encoding
X-Cache: MISS
Accept-Ranges: bytes

5908
<!doctype html>
<head>
  <meta charset="utf-8">
  <meta http-equiv="x-ua-compatible" content="ie=edge">
  <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
  <title>StackPath</title>
  <style>
    * {
      box-sizing: border-box;
    }
... etc...

The workaround is to include a recognised User-Agent header in your request, which is easier to to if you use Invoke-WebRequest like @BiNZGi suggested, rather than the WebClient class - see the "short answer" above for the code.

Also, note that this sniffing behaviour with the User-Agent is specific to "thegivebackproject.org" website and isn't necessarily true for other websites - you don't always need to include a User-Agent header as a rule of thumb.

You can use the easier Invoke-WebRequest :

Invoke-WebRequest -URI https://thegivebackproject.org/CheckStatus.txt -OutFile D:\CheckStatus.txt

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM