I'm having an unexpected behavior with the System.Uri
class. When an instance of System.Uri
is created, and the UrlString
has some patterns like ...
, or ...#
, or .#
, the System.Uri
removes all repeated .
characters.
This is weird, but I believe this behavior is based on RFC 2396.
The problem begins when I try to download the HTML from this URL: http://www.submarino.com.br/produto/1/23853463/mundo+segundo+steve+jobs,+o:+as+frases+mais+inspiradoras+ ...
and the System.Uri
removes all the repeated .
s. As the web site doesn't recognize the "New URL," it redirects to the rriginal URL. Then a "System.Net.WebException: Too many automatic redirections were attempted" is thrown and the page is never reached.
How can I solve this issue?
You can use reflection to remove that particular attribute. Use this before your Uri
call:
MethodInfo getSyntax = typeof(UriParser).GetMethod("GetSyntax", System.Reflection.BindingFlags.Static | System.Reflection.BindingFlags.NonPublic);
FieldInfo flagsField = typeof(UriParser).GetField("m_Flags", System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic);
if (getSyntax != null && flagsField != null)
{
foreach (string scheme in new[] { "http", "https" })
{
UriParser parser = (UriParser)getSyntax.Invoke(null, new object[] { scheme });
if (parser != null)
{
int flagsValue = (int)flagsField.GetValue(parser);
// Clear the CanonicalizeAsFilePath attribute
if ((flagsValue & 0x1000000) != 0)
flagsField.SetValue(parser, flagsValue & ~0x1000000);
}
}
}
It has been reported to Connect before .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.