简体   繁体   中英

HttpClient GetAsync with a hash in URL

.NET Core 2.2 console application on Windows.

I'm exploring how to use HttpClient GetAsync on a Stackoverflow share style URL eg: https://stackoverflow.com/a/29809054/26086 which returns a 302 redirect URL with a hash in it

static async Task Main()
{
    var client = new HttpClient();

    // 1. Doesn't work - has a hash in URL
    var url = "https://stackoverflow.com/questions/29808915/why-use-async-await-all-the-way-down/29809054#29809054";
    HttpResponseMessage rm = await client.GetAsync(url);
    Console.WriteLine($"Status code: {(int)rm.StatusCode}"); // 400 Bad Request

    // 2. Does work - no hash
    url = "https://stackoverflow.com/questions/29808915/why-use-async-await-all-the-way-down/29809054";
    rm = await client.GetAsync(url);
    Console.WriteLine($"Status code: {(int)rm.StatusCode}"); // 200 Okay

    // 3. Doesn't work as the 302 redirect goes to the first URL above with a hash
    url = "https://stackoverflow.com/a/29809054/26086";
    rm = await client.GetAsync(url);
    Console.WriteLine($"Status code: {(int)rm.StatusCode}"); // 400 Bad Request
}

I'm crawling my blog which has many SO short codes in it.

Update/Workaround With thanks to @rohancragg I found that turning off AutoRedirect then getting the URI from the returned header worked

// as some autoredirects fail due to #fragments in url, handle redirects manually
var handler = new HttpClientHandler { AllowAutoRedirect = false };
var client = new HttpClient(handler);

var url = "https://stackoverflow.com/a/29809054/26086";    
HttpResponseMessage rm = await client.GetAsync(url);

// gives the desired new URL which can then GetAsync
Uri u = rm.Headers.Location;

As @Damien_The_Unbeliever implies in a comment, you'll just need to strip off the hash and everything after it - all that does is tell the browser to jump to that anchor tag in the HTML page (see: https://w3schools.com/jsref/prop_anchor_hash.asp ).

You could also use the Uri class to parse the Uri and ignore any 'fragments': https://docs.microsoft.com/en-us/dotnet/api/system.uri.fragment

Because the share-style Urls are only ever going to return a 302 then I'd suggest capturing the Uri to which the 302 is referring and do as I suggest above and just get the path and ignore the fragment.

So you need to use some mechanism (which I'm just looking up!) to handle a 302 gracefully followed by option 2

Update: this looks relevant! How can I get System.Net.Http.HttpClient to not follow 302 redirects?

Update 2 Steve Guidi has a very important bit of advice in a comment here: https://stackoverflow.com/a/17758758/5351

In response to the advice that you need to use HttpResponseMessage.RequestMessage.RequestUri :

it is very important to add HttpCompletionOption.ResponseHeadersRead as the second parameter of the GetAsync() call


Disclaimer - I've not tried the above, this is just based on reading ;-)

Maybe you need to encode your URL before send the request using HttpUtility class, this way any special character will be escaped.

using System.Web;

var url = $"htpps://myurl.com/{HttpUtility.UrlEncode("#1234567")}";

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM