I am facing an issue in downloading the txt file from a website. The script below downloads the http code instead of the actual txt file and its contents.
$WebClient = New-Object System.Net.WebClient $WebClient.DownloadFile("https://thegivebackproject.org/CheckStatus.txt", "D:\CheckStatus.txt")
Short Answer
The server is doing browser sniffing to send different responses based on the User-Agent
header in your request. You can get the response you want by sending a canned user agent string:
$useragent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36"
Invoke-WebRequest -URI https://thegivebackproject.org/CheckStatus.txt -OutFile c:\temp\CheckStatus.txt -UserAgent $useragent
Long Answer
The server responding to the url you're hitting is doing browser sniffing to decide what content to return. If you give it a User-Agent
header that it recognises it will return the response you're expecting (ie the literal text "Azeemkhan-WaseemRaza").
If you don't include a User-Agent
header (and $WebClient.DownloadFile
doesn't include one), the server is responding with a html page instead.
You can see this behaviour yourself if you install a HTTP trace tool like Fiddler . When you hit the page in a browser you see this HTTP request and response pair:
request
GET https://thegivebackproject.org/CheckStatus.txt HTTP/1.1
Host: thegivebackproject.org
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36
Sec-Fetch-User: ?1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3
Sec-Fetch-Site: none
Sec-Fetch-Mode: navigate
Accept-Encoding: gzip, deflate, br
Accept-Language: en-GB,en-US;q=0.9,en;q=0.8
Cookie: SPSI=ee952ba44e33e958f963807ede78624b
response
HTTP/1.1 200 OK
Server: nginx
Date: Tue, 12 Nov 2019 08:13:57 GMT
Content-Type: text/plain
Content-Length: 20
Connection: keep-alive
Last-Modified: Thu, 07 Nov 2019 16:15:48 GMT
Accept-Ranges: bytes
X-Cache: MISS
Azeemkhan-WaseemRaza
but when you use $WebClient.DownloadFile
you see this instead:
request
GET https://thegivebackproject.org/CheckStatus.txt HTTP/1.1
Host: thegivebackproject.org
response
HTTP/1.1 200 OK
Server: nginx
Date: Tue, 12 Nov 2019 08:14:21 GMT
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
Set-Cookie: SPSI=9c24f8993046ef610e25cc727c4a4ae2; Path=/
Set-Cookie: adOtr=obsvl; Expires=Thu, 2 Aug 2001 20:47:11 UTC; Path=/
Set-Cookie: UTGv2=D-h4d40f620bfdd6c3b77b035ee99f96621134; Expires=Wed, 11-Nov-20 08:14:21 GMT; Path=/
cache-control: no-store, no-cache, max-age=0, must-revalidate, private, max-stale=0, post-check=0, pre-check=0
Vary: Accept-Encoding
X-Cache: MISS
Accept-Ranges: bytes
5908
<!doctype html>
<head>
<meta charset="utf-8">
<meta http-equiv="x-ua-compatible" content="ie=edge">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<title>StackPath</title>
<style>
* {
box-sizing: border-box;
}
... etc...
The workaround is to include a recognised User-Agent
header in your request, which is easier to to if you use Invoke-WebRequest
like @BiNZGi suggested, rather than the WebClient class - see the "short answer" above for the code.
Also, note that this sniffing behaviour with the User-Agent
is specific to "thegivebackproject.org" website and isn't necessarily true for other websites - you don't always need to include a User-Agent
header as a rule of thumb.
You can use the easier Invoke-WebRequest :
Invoke-WebRequest -URI https://thegivebackproject.org/CheckStatus.txt -OutFile D:\CheckStatus.txt
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.