.NET Core 2.2 console application on Windows.
I'm exploring how to use HttpClient GetAsync on a Stackoverflow share style URL eg: https://stackoverflow.com/a/29809054/26086 which returns a 302 redirect URL with a hash in it
static async Task Main()
{
var client = new HttpClient();
// 1. Doesn't work - has a hash in URL
var url = "https://stackoverflow.com/questions/29808915/why-use-async-await-all-the-way-down/29809054#29809054";
HttpResponseMessage rm = await client.GetAsync(url);
Console.WriteLine($"Status code: {(int)rm.StatusCode}"); // 400 Bad Request
// 2. Does work - no hash
url = "https://stackoverflow.com/questions/29808915/why-use-async-await-all-the-way-down/29809054";
rm = await client.GetAsync(url);
Console.WriteLine($"Status code: {(int)rm.StatusCode}"); // 200 Okay
// 3. Doesn't work as the 302 redirect goes to the first URL above with a hash
url = "https://stackoverflow.com/a/29809054/26086";
rm = await client.GetAsync(url);
Console.WriteLine($"Status code: {(int)rm.StatusCode}"); // 400 Bad Request
}
I'm crawling my blog which has many SO short codes in it.
Update/Workaround With thanks to @rohancragg I found that turning off AutoRedirect then getting the URI from the returned header worked
// as some autoredirects fail due to #fragments in url, handle redirects manually
var handler = new HttpClientHandler { AllowAutoRedirect = false };
var client = new HttpClient(handler);
var url = "https://stackoverflow.com/a/29809054/26086";
HttpResponseMessage rm = await client.GetAsync(url);
// gives the desired new URL which can then GetAsync
Uri u = rm.Headers.Location;
As @Damien_The_Unbeliever implies in a comment, you'll just need to strip off the hash and everything after it - all that does is tell the browser to jump to that anchor tag in the HTML page (see: https://w3schools.com/jsref/prop_anchor_hash.asp ).
You could also use the Uri class to parse the Uri and ignore any 'fragments': https://docs.microsoft.com/en-us/dotnet/api/system.uri.fragment
Because the share-style Urls are only ever going to return a 302 then I'd suggest capturing the Uri to which the 302 is referring and do as I suggest above and just get the path and ignore the fragment.
So you need to use some mechanism (which I'm just looking up!) to handle a 302 gracefully followed by option 2
Update: this looks relevant! How can I get System.Net.Http.HttpClient to not follow 302 redirects?
Update 2 Steve Guidi has a very important bit of advice in a comment here: https://stackoverflow.com/a/17758758/5351
In response to the advice that you need to use HttpResponseMessage.RequestMessage.RequestUri
:
it is very important to add
HttpCompletionOption.ResponseHeadersRead
as the second parameter of theGetAsync()
call
Disclaimer - I've not tried the above, this is just based on reading ;-)
Maybe you need to encode your URL before send the request using HttpUtility class, this way any special character will be escaped.
using System.Web;
var url = $"htpps://myurl.com/{HttpUtility.UrlEncode("#1234567")}";
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.