简体   繁体   English

ASP.Net URL编码

[英]ASP.Net URL Encoding

I am implementing URL rewriting in ASP.net and my URLs are causing me a world of problems. 我正在ASP.net中实现URL重写,我的URL导致我的问题世界。

The URL is generated from a database of departments & categories. URL是从部门和类别的数据库生成的。 I want employees to be able to add items to the database with whatever special characters are appropriate without it breaking the site. 我希望员工能够使用适当的特殊字符向数据库添加项目,而不会破坏站点。

I am encoding the data before I construct the URLs. 我在构造URL之前编码数据。

There are several problems... 有几个问题......

  1. IIS decodes the URL before it reaches .net making it impossible to properly parse anything with a "/" in it. IIS在到达.net之前对URL进行解码,因此无法正确解析其中包含“/”的任何内容。
  2. ASP.net gets confused by the url making "~" useless within certain pages ASP.net在某些页面中使用“〜”无效的网址感到困惑
  3. I migrated from the built in test server to my local IIS server (XP machine) and any URL containing an encoded & (%26) gives me a "Bad Request" error. 我从内置测试服务器迁移到我的本地IIS服务器(XP机器),任何包含编码&(%26)的URL都会给我一个“错误请求”错误。
  4. UrlEncode leaves some breaking characters untouched such as '.' UrlEncode留下一些破碎的字符,如'。'

I did have two other related posts on this subject, at the time I only saw the small problems not the big problem upstream. 我确实有两个关于这个主题的其他相关帖子,当时我只看到小问题不是上游的大问题。 I've found some registry tricks to solve the "Bad Request" issue but I'm going to be deploying to a shared hosting environment making that useless. 我发现了一些解决“错误请求”问题的注册表技巧,但我将部署到共享托管环境,使其无用。 I also know that this is a fix for some security issue so I don't want to necessarily bypass it without knowing what can of worms I'm opening. 我也知道这是一个解决某些安全问题的方法,因此我不想在不知道我正在打开哪些蠕虫的情况下绕过它。

Rather than trying to force .net to pass me the raw url, or override IIS settings i'd like to make truly safe URLs in the first place. 而不是试图强制.net传递原始URL,或覆盖IIS设置,我想首先制作真正安全的URL。

I'll note i've tried AntiXss.URLEncode, HttpUtility.URLEncode, URI.EscapeDataString. 我会注意到我已经尝试过AntiXss.URLEncode,HttpUtility.URLEncode,URI.EscapeDataString。 I've even tried stupid things like double URLEncodng. 我甚至尝试过像双URLEncodng这样的蠢事。 Is there a utility that does what I need, or do i really need to roll my own. 是否有一个实用程序可以满足我的需要,或者我真的需要自己动手。 I'm even considering doing something Hacky like replacing the % with an unusual string of characters. 我甚至考虑做一些Hacky,比如用一个不寻常的字符串替换%。 The end result should be at least readable which was the point of using URL rewriting in the first place. 最终结果应至少是可读的,这首先是使用URL重写的重点。

Sorry for the long post- I just wanted to make sure that I've included all the necessary details. 很抱歉很长的帖子 - 我只是想确保我已经包含了所有必要的细节。 I can't seem to find any relevant information on this, and it seems like it would be a common problem - so maybe I'm missing something big. 我似乎无法找到任何相关信息,这似乎是一个常见的问题 - 所以也许我错过了一些大事。 Thanks for your help, and patience with the long explanation! 感谢您的帮助,以及对长篇解释的耐心!


Edit for clarity: 为清晰起见编辑:

When I say the urls are being built from a database what I mean is that the directory structure is contstructed from the departments and categories in my database. 当我说从数据库构建网址时,我的意思是目录结构是从我的数据库中的部门和类别构建的。

Some Example URLS - 一些示例URLS -

Mystore/Refrigeration/Bar+Fridge.aspx 的MyStore /制冷/酒吧+ Fridge.aspx
Mystore/Cooking+Equipment.aspx 的MyStore /烹饪+ Equipment.aspx
Mystore/Kitchen/Cutting+Boards.asxpx 的MyStore /厨房/切割+ Boards.asxpx

The problems come in when I use a department like "Beverage & Bar" or "Pastry/Decorating" to construct my URL. 当我使用像“Beverage&Bar”或“Pastry / Decorating”这样的部门来构建我的URL时会出现问题。 Despite being encoded first these cause the aforementioned issues. 尽管首先编码,但这些都会导致上述问题。

My handlers are already implemented and working fine except for the special character encoding issues. 除了特殊的字符编码问题之外,我的处理程序已经实现并且工作正常。

You should consider having a table off of your category/department table which has a unique URL for each category. 您应该考虑从您的类别/部门表中选择一个表,该表具有每个类别的唯一URL。 Then you can use a special routine to generate the URLs. 然后,您可以使用特殊例程来生成URL。 This can be a SQL scalar function, or a CLR function, but one of the things it would do is normalize the URL for the web. 这可以是SQL标量函数或CLR函数,但它要做的一件事就是规范化Web的URL。 You can convert "Beverage & Bar" to "Beverage-And-Bar" and "Pastry / Decorating" to "Pastry-Decorating". 您可以将“Beverage&Bar”转换为“Beverage-And-Bar”和“Pastry / Decorating”转换为“Pastry-Decorating”。 Mainly, the routine needs to replace all invalid HTTP URL characters with something else. 主要是,例程需要用其他东西替换所有无效的HTTP URL字符。 An example is this: 一个例子是:

public static class URL
{
    static readonly Regex feet = new Regex(@"([0-9]\s?)'([^'])", RegexOptions.Compiled);
    static readonly Regex inch1 = new Regex(@"([0-9]\s?)''", RegexOptions.Compiled);
    static readonly Regex inch2 = new Regex(@"([0-9]\s?)""", RegexOptions.Compiled);
    static readonly Regex num = new Regex(@"#([0-9]+)", RegexOptions.Compiled);
    static readonly Regex dollar = new Regex(@"[$]([0-9]+)", RegexOptions.Compiled);
    static readonly Regex percent = new Regex(@"([0-9]+)%", RegexOptions.Compiled);
    static readonly Regex sep = new Regex(@"[\s_/\\+:.]", RegexOptions.Compiled);
    static readonly Regex empty = new Regex(@"[^-A-Za-z0-9]", RegexOptions.Compiled);
    static readonly Regex extra = new Regex(@"[-]+", RegexOptions.Compiled);

    public static string PrepareURL(string str)
    {
        str = str.Trim().ToLower();
        str = str.Replace("&", "and");

        str = feet.Replace(str, "$1-ft-");
        str = inch1.Replace(str, "$1-in-");
        str = inch2.Replace(str, "$1-in-");
        str = num.Replace(str, "num-$1");

        str = dollar.Replace(str, "$1-dollar-");
        str = percent.Replace(str, "$1-percent-");

        str = sep.Replace(str, "-");

        str = empty.Replace(str, string.Empty);
        str = extra.Replace(str, "-");

        str = str.Trim('-');
        return str;
    }
}

You could make this a SQL enhance function, or run URL generation as a separate process. 您可以将其设置为SQL增强功能,或将URL生成作为单独的进程运行。 Then to implement mapping, you would map the entire URL directly to a category ID. 然后,要实现映射,您可以将整个URL直接映射到类别ID。 This approach is better in the long run for several reasons. 从长远来看,这种方法更好,原因有几个。 First, you are not always generating URLs, you do this once and they stay static, you don't have to worry about your procedure changing, and then GoogleBot not being able to find old URLs. 首先,您并不总是生成网址,只需执行此操作一次,它们保持静态,您不必担心程序更改,然后GoogleBot无法找到旧网址。 Also, if you get a collision, you may notice a potential duplicate category name, because a collision would only be different by special characters. 此外,如果发生碰撞,您可能会注意到潜在的重复类别名称,因为碰撞只会因特殊字符而异。 Finally, you can always view your URLs from the database, without having to run the mapping function. 最后,您始终可以从数据库中查看URL,而无需运行映射功能。

I have a url rewrite i implement in the global.asax file in the begin authenticated request as I have some security. 我有一个url重写我在开始验证请求的global.asax文件中实现,因为我有一些安全性。 This is where I take the raw url and then do the db look up. 这是我获取原始URL然后进行数据库查找的地方。 this then rewrites the path to the aspx page and all the parameters are passed through the query string. 然后重写aspx页面的路径,所有参数都通过查询字符串传递。 No encoding is necessary. 不需要编码。

However if you are using the url to actually change data then i can see that you will have huge problems as you are effectively using the http GET to change database. 但是,如果您使用url实际更改数据,那么我可以看到,当您有效地使用http GET更改数据库时,您将遇到大问题。 It is usually concidered a bad idead, and not something i do. 它通常被认为是一个糟糕的想法,而不是我做的事情。

I only use a post request to do any databse manipulation. 我只使用post请求进行任何数据库操作。 This keeps the url clean as all the data is in the page form. 这样可以保持网址清晰,因为所有数据都在页面中。

The only issue i had was to set the correct url to the page.form.action which in most cases is the raw url. 我唯一的问题是设置正确的url到page.form.action,在大多数情况下,它是原始网址。

If its the category names that are causing the issue then perhaps you should restrict the names to alpha numeric characters only and swap spaces for "-". 如果它是引起问题的类别名称,那么您可能应该仅将名称限制为字母数字字符,并将空格换成“ - ”。 IIS will throw a wobbly with periods "." IIS将会出现一段时间“摇摇欲坠”。 as it looks for file names. 因为它寻找文件名。

PS IIS does not understand the tilde "~", this is something that the compiler understands. PS IIS不理解代字号“〜”,这是编译器理解的东西。 so if you use it in an anchor tag it will not work as expected and you should use the application root instead of the tilde. 因此,如果您在锚标记中使用它,它将无法按预期工作,您应该使用应用程序根而不是代字号。

Edit: 编辑:

OK, it looks like an issue with IIS having issues with certain characters such as . 好吧,看起来IIS的问题与某些字符有问题,例如。 / and &. /和&。 Even if you do urlencode these IIS will still try to implement its own meanings. 即使你做urlencode这些IIS仍然会尝试实现自己的意义。 As such consider removing them so: 因此,考虑删除它们:

Beverage & bar becomes BeverageBar 饮料和酒吧成为BeverageBar

Pastry / decorating becomes PastryDecorating. 糕点/装饰成为PastryDecorating。

This will keep you urls clean, but does mean an extra column in the database so you can cheack the url against this shortened category name. 这将使您保持网址清洁,但确实意味着数据库中有一个额外的列,因此您可以根据此缩短的类别名称来填充网址。

I'm having the exact same problem. 我有完全相同的问题。 Thanks for writing it up so nicely. 谢谢你写得这么好。 It actually helped me to understand the problem better. 它实际上帮助我更好地理解了这个问题。

I had some other considerations however. 不过我还有其他一些考虑因素。 One of the goals I have is to support the potential for any characters to be in the url which is based on the title of an article. 我的目标之一是支持任何字符在基于文章标题的URL中的可能性。 Additionally I want to ensure uniqueness in the encoding and a two way encode / decode process. 另外,我想确保编码的唯一性和双向编码/解码过程。

So I did some manual encoding to solve the problem. 所以我做了一些手动编码来解决问题。 This won't completely eliminate percent encoding, but will greatly reduce it and keep users from generating an inaccessible url. 这不会完全消除百分比编码,但会大大减少编码,并防止用户生成无法访问的URL。 My process starts with using the Server.URLEncode function. 我的过程从使用Server.URLEncode函数开始。 But this doesn't eliminate the problems in the url. 但这并没有消除网址中的问题。 Because IIS is decoding the url and then passing it to the application, certain characters will break it with a dangerous request exception. 因为IIS正在解码URL然后将其传递给应用程序,所以某些字符会因危险的请求异常而中断它。 These characters include +, &, /, !, *, ., ( and ) . 这些字符包括+, &, /, !, *, ., () So on those characters plus other characters I would like to make more readable I do a double encoding for a more usable url. 所以在那些字符和其他字符上我想使其更具可读性我会对更有用的网址进行双重编码。 Encoding is also hard because of the limited number of characters that are allowed in an url. 编码也很难,因为网址中允许的字符数量有限。 So prior to encoding I made all letters capital and then did the encoding with lower case. 因此,在编码之前,我将所有字母设为大写,然后使用小写进行编码。 This keeps it from being totally decodable, but I can easily do a match in the database or in code by making the value I wish to match be upper case. 这使它不能完全解码,但我可以通过使我希望匹配的值为大写,轻松地在数据库或代码中进行匹配。

Well, here is my code. 好吧,这是我的代码。 Feedback would be appreciated. 反馈将不胜感激。 Oh ya, this is in VB, but things should transfer over to C# easy enough. 哦,是的,这是在VB,但事情应该转移到C#很容易。

Dim strReturn As String = Trim(strStringToEncode)
strReturn = Server.UrlEncode(strReturn)

strReturn = strReturn.Replace("-", "dash").Replace("+", "-")

strReturn = strReturn.Replace("%26", "and").
                    Replace("%2f", "or").
                    Replace("!", "excl").
                    Replace("*", "star").
                    Replace("%27", "apos").
                    Replace("(", "lprn").
                    Replace(")", "rprn").
                    Replace("%3b", "semi").
                    Replace("%3a", "coln").
                    Replace("%40", "at").
                    Replace("%3d", "eq").
                    Replace("%2b", "plus").
                    Replace("%24", "dols").
                    Replace("%25", "pct").
                    Replace("%2c", "coma").
                    Replace("%3f", "query").
                    Replace("%23", "hash").
                    Replace("%5b", "lbrk").
                    Replace("%5d", "rbrk").
                    Replace(".", "dot").
                    Replace("%3e", "gt").
                    Replace("%3c", "lt")

Return strReturn

我猜你正在寻找HttpUtility.UrlEncodeHttpUtility.HtmlDecode

string url = "http://www.google.com/search?q=" + HttpUtility.UrlEncode("Example");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM