简体   繁体   中英

ASP.Net URL Encoding

I am implementing URL rewriting in ASP.net and my URLs are causing me a world of problems.

The URL is generated from a database of departments & categories. I want employees to be able to add items to the database with whatever special characters are appropriate without it breaking the site.

I am encoding the data before I construct the URLs.

There are several problems...

  1. IIS decodes the URL before it reaches .net making it impossible to properly parse anything with a "/" in it.
  2. ASP.net gets confused by the url making "~" useless within certain pages
  3. I migrated from the built in test server to my local IIS server (XP machine) and any URL containing an encoded & (%26) gives me a "Bad Request" error.
  4. UrlEncode leaves some breaking characters untouched such as '.'

I did have two other related posts on this subject, at the time I only saw the small problems not the big problem upstream. I've found some registry tricks to solve the "Bad Request" issue but I'm going to be deploying to a shared hosting environment making that useless. I also know that this is a fix for some security issue so I don't want to necessarily bypass it without knowing what can of worms I'm opening.

Rather than trying to force .net to pass me the raw url, or override IIS settings i'd like to make truly safe URLs in the first place.

I'll note i've tried AntiXss.URLEncode, HttpUtility.URLEncode, URI.EscapeDataString. I've even tried stupid things like double URLEncodng. Is there a utility that does what I need, or do i really need to roll my own. I'm even considering doing something Hacky like replacing the % with an unusual string of characters. The end result should be at least readable which was the point of using URL rewriting in the first place.

Sorry for the long post- I just wanted to make sure that I've included all the necessary details. I can't seem to find any relevant information on this, and it seems like it would be a common problem - so maybe I'm missing something big. Thanks for your help, and patience with the long explanation!


Edit for clarity:

When I say the urls are being built from a database what I mean is that the directory structure is contstructed from the departments and categories in my database.

Some Example URLS -

Mystore/Refrigeration/Bar+Fridge.aspx
Mystore/Cooking+Equipment.aspx
Mystore/Kitchen/Cutting+Boards.asxpx

The problems come in when I use a department like "Beverage & Bar" or "Pastry/Decorating" to construct my URL. Despite being encoded first these cause the aforementioned issues.

My handlers are already implemented and working fine except for the special character encoding issues.

You should consider having a table off of your category/department table which has a unique URL for each category. Then you can use a special routine to generate the URLs. This can be a SQL scalar function, or a CLR function, but one of the things it would do is normalize the URL for the web. You can convert "Beverage & Bar" to "Beverage-And-Bar" and "Pastry / Decorating" to "Pastry-Decorating". Mainly, the routine needs to replace all invalid HTTP URL characters with something else. An example is this:

public static class URL
{
    static readonly Regex feet = new Regex(@"([0-9]\s?)'([^'])", RegexOptions.Compiled);
    static readonly Regex inch1 = new Regex(@"([0-9]\s?)''", RegexOptions.Compiled);
    static readonly Regex inch2 = new Regex(@"([0-9]\s?)""", RegexOptions.Compiled);
    static readonly Regex num = new Regex(@"#([0-9]+)", RegexOptions.Compiled);
    static readonly Regex dollar = new Regex(@"[$]([0-9]+)", RegexOptions.Compiled);
    static readonly Regex percent = new Regex(@"([0-9]+)%", RegexOptions.Compiled);
    static readonly Regex sep = new Regex(@"[\s_/\\+:.]", RegexOptions.Compiled);
    static readonly Regex empty = new Regex(@"[^-A-Za-z0-9]", RegexOptions.Compiled);
    static readonly Regex extra = new Regex(@"[-]+", RegexOptions.Compiled);

    public static string PrepareURL(string str)
    {
        str = str.Trim().ToLower();
        str = str.Replace("&", "and");

        str = feet.Replace(str, "$1-ft-");
        str = inch1.Replace(str, "$1-in-");
        str = inch2.Replace(str, "$1-in-");
        str = num.Replace(str, "num-$1");

        str = dollar.Replace(str, "$1-dollar-");
        str = percent.Replace(str, "$1-percent-");

        str = sep.Replace(str, "-");

        str = empty.Replace(str, string.Empty);
        str = extra.Replace(str, "-");

        str = str.Trim('-');
        return str;
    }
}

You could make this a SQL enhance function, or run URL generation as a separate process. Then to implement mapping, you would map the entire URL directly to a category ID. This approach is better in the long run for several reasons. First, you are not always generating URLs, you do this once and they stay static, you don't have to worry about your procedure changing, and then GoogleBot not being able to find old URLs. Also, if you get a collision, you may notice a potential duplicate category name, because a collision would only be different by special characters. Finally, you can always view your URLs from the database, without having to run the mapping function.

I have a url rewrite i implement in the global.asax file in the begin authenticated request as I have some security. This is where I take the raw url and then do the db look up. this then rewrites the path to the aspx page and all the parameters are passed through the query string. No encoding is necessary.

However if you are using the url to actually change data then i can see that you will have huge problems as you are effectively using the http GET to change database. It is usually concidered a bad idead, and not something i do.

I only use a post request to do any databse manipulation. This keeps the url clean as all the data is in the page form.

The only issue i had was to set the correct url to the page.form.action which in most cases is the raw url.

If its the category names that are causing the issue then perhaps you should restrict the names to alpha numeric characters only and swap spaces for "-". IIS will throw a wobbly with periods "." as it looks for file names.

PS IIS does not understand the tilde "~", this is something that the compiler understands. so if you use it in an anchor tag it will not work as expected and you should use the application root instead of the tilde.

Edit:

OK, it looks like an issue with IIS having issues with certain characters such as . / and &. Even if you do urlencode these IIS will still try to implement its own meanings. As such consider removing them so:

Beverage & bar becomes BeverageBar

Pastry / decorating becomes PastryDecorating.

This will keep you urls clean, but does mean an extra column in the database so you can cheack the url against this shortened category name.

I'm having the exact same problem. Thanks for writing it up so nicely. It actually helped me to understand the problem better.

I had some other considerations however. One of the goals I have is to support the potential for any characters to be in the url which is based on the title of an article. Additionally I want to ensure uniqueness in the encoding and a two way encode / decode process.

So I did some manual encoding to solve the problem. This won't completely eliminate percent encoding, but will greatly reduce it and keep users from generating an inaccessible url. My process starts with using the Server.URLEncode function. But this doesn't eliminate the problems in the url. Because IIS is decoding the url and then passing it to the application, certain characters will break it with a dangerous request exception. These characters include +, &, /, !, *, ., ( and ) . So on those characters plus other characters I would like to make more readable I do a double encoding for a more usable url. Encoding is also hard because of the limited number of characters that are allowed in an url. So prior to encoding I made all letters capital and then did the encoding with lower case. This keeps it from being totally decodable, but I can easily do a match in the database or in code by making the value I wish to match be upper case.

Well, here is my code. Feedback would be appreciated. Oh ya, this is in VB, but things should transfer over to C# easy enough.

Dim strReturn As String = Trim(strStringToEncode)
strReturn = Server.UrlEncode(strReturn)

strReturn = strReturn.Replace("-", "dash").Replace("+", "-")

strReturn = strReturn.Replace("%26", "and").
                    Replace("%2f", "or").
                    Replace("!", "excl").
                    Replace("*", "star").
                    Replace("%27", "apos").
                    Replace("(", "lprn").
                    Replace(")", "rprn").
                    Replace("%3b", "semi").
                    Replace("%3a", "coln").
                    Replace("%40", "at").
                    Replace("%3d", "eq").
                    Replace("%2b", "plus").
                    Replace("%24", "dols").
                    Replace("%25", "pct").
                    Replace("%2c", "coma").
                    Replace("%3f", "query").
                    Replace("%23", "hash").
                    Replace("%5b", "lbrk").
                    Replace("%5d", "rbrk").
                    Replace(".", "dot").
                    Replace("%3e", "gt").
                    Replace("%3c", "lt")

Return strReturn

我猜你正在寻找HttpUtility.UrlEncodeHttpUtility.HtmlDecode

string url = "http://www.google.com/search?q=" + HttpUtility.UrlEncode("Example");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM