简体   繁体   中英

Issue with System.Uri

I'm having an unexpected behavior with the System.Uri class. When an instance of System.Uri is created, and the UrlString has some patterns like ... , or ...# , or .# , the System.Uri removes all repeated . characters.

This is weird, but I believe this behavior is based on RFC 2396.

The problem begins when I try to download the HTML from this URL: http://www.submarino.com.br/produto/1/23853463/mundo+segundo+steve+jobs,+o:+as+frases+mais+inspiradoras+ ...

and the System.Uri removes all the repeated . s. As the web site doesn't recognize the "New URL," it redirects to the rriginal URL. Then a "System.Net.WebException: Too many automatic redirections were attempted" is thrown and the page is never reached.

How can I solve this issue?

You can use reflection to remove that particular attribute. Use this before your Uri call:

MethodInfo getSyntax = typeof(UriParser).GetMethod("GetSyntax", System.Reflection.BindingFlags.Static | System.Reflection.BindingFlags.NonPublic);
FieldInfo flagsField = typeof(UriParser).GetField("m_Flags", System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic);
if (getSyntax != null && flagsField != null)
{
    foreach (string scheme in new[] { "http", "https" })
    {
        UriParser parser = (UriParser)getSyntax.Invoke(null, new object[] { scheme });
        if (parser != null)
        {
            int flagsValue = (int)flagsField.GetValue(parser);
            // Clear the CanonicalizeAsFilePath attribute
            if ((flagsValue & 0x1000000) != 0)
                flagsField.SetValue(parser, flagsValue & ~0x1000000);
        }
    }
}

It has been reported to Connect before .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM