简体   繁体   English

HttpUtility.UrlEncode是否符合'x-www-form-urlencoded'的规范?

[英]Does HttpUtility.UrlEncode match the spec for 'x-www-form-urlencoded'?

Per MSDN 每个MSDN

URLEncode converts characters as follows: URLEncode转换字符如下:

  • Spaces ( ) are converted to plus signs (+). Spaces()转换为加号(+)。
  • Non-alphanumeric characters are escaped to their hexadecimal representation. 非字母数字字符转义为十六进制表示。

Which is similar, but not exactly the same as W3C 这与W3C类似,但不完全相同

application/x-www-form-urlencoded 应用程序/ x-WWW窗体-urlencoded

This is the default content type. 这是默认的内容类型。 Forms submitted with this content type must be encoded as follows: 使用此内容类型提交的表单必须按如下方式编码:

  1. Control names and values are escaped. 控制名称和值将被转义。 Space characters are replaced by '+', and then reserved characters are escaped as described in RFC1738 , section 2.2: Non-alphanumeric characters are replaced by '%HH', a percent sign and two hexadecimal digits representing the ASCII code of the character. 空格字符由'+'替换,然后保留字符按RFC1738第2.2节中的描述进行转义:非字母数字字符由'%HH'替换,百分号和两个十六进制数字表示字符的ASCII代码。 Line breaks are represented as "CR LF" pairs (ie, '%0D%0A'). 换行符表示为“CR LF”对(即'%0D%0A')。

  2. The control names/values are listed in the order they appear in the document. 控件名称/值按它们在文档中出现的顺序列出。 The name is separated from the value by '=' and name/value pairs are separated from each other by '&'. 名称通过'='与值分隔,名称/值对通过'&'彼此分隔。

My question is, has anyone done the work to determine whether URLEncode produces valid x-www-form-urlencoded data? 我的问题是,是否有人完成了确定URLEncode是否生成有效的x-www-form-urlencoded数据的工作?

Well, the documentation you linked to is for IIS 6 Server.UrlEncode, but your title seems to ask about .NET System.Web.HttpUtility.UrlEncode . 好吧,你链接到的文档是针对IIS 6 Server.UrlEncode的,但是你的标题似乎询问了.NET System.Web.HttpUtility.UrlEncode Using a tool like Reflector, we can see the implementation of the latter and determine if it meets the W3C spec. 使用像Reflector这样的工具,我们可以看到后者的实现,并确定它是否符合W3C规范。

Here is the encoding routine that is ultimately called (note, it is defined for an array of bytes, and other overloads that take strings eventually convert those strings to byte arrays and call this method). 这是最终调用的编码例程(注意,它是为一个字节数组定义的,其他重载使得字符串最终将这些字符串转换为字节数组并调用此方法)。 You would call this for each control name and value (to avoid escaping the reserved characters = & used as separators). 您可以为每个控件名称和值调用此方法(以避免转义保留字符= &用作分隔符)。

protected internal virtual byte[] UrlEncode(byte[] bytes, int offset, int count)
{
    if (!ValidateUrlEncodingParameters(bytes, offset, count))
    {
        return null;
    }
    int num = 0;
    int num2 = 0;
    for (int i = 0; i < count; i++)
    {
        char ch = (char) bytes[offset + i];
        if (ch == ' ')
        {
            num++;
        }
        else if (!HttpEncoderUtility.IsUrlSafeChar(ch))
        {
            num2++;
        }
    }
    if ((num == 0) && (num2 == 0))
    {
        return bytes;
    }
    byte[] buffer = new byte[count + (num2 * 2)];
    int num4 = 0;
    for (int j = 0; j < count; j++)
    {
        byte num6 = bytes[offset + j];
        char ch2 = (char) num6;
        if (HttpEncoderUtility.IsUrlSafeChar(ch2))
        {
            buffer[num4++] = num6;
        }
        else if (ch2 == ' ')
        {
            buffer[num4++] = 0x2b;
        }
        else
        {
            buffer[num4++] = 0x25;
            buffer[num4++] = (byte) HttpEncoderUtility.IntToHex((num6 >> 4) & 15);
            buffer[num4++] = (byte) HttpEncoderUtility.IntToHex(num6 & 15);
        }
    }
    return buffer;
}

public static bool IsUrlSafeChar(char ch)
{
    if ((((ch >= 'a') && (ch <= 'z')) || ((ch >= 'A') && (ch <= 'Z'))) || ((ch >= '0') && (ch <= '9')))
    {
        return true;
    }
    switch (ch)
    {
        case '(':
        case ')':
        case '*':
        case '-':
        case '.':
        case '_':
        case '!':
            return true;
    }
    return false;
}

The first part of the routine counts the number of characters that need to be replaced (spaces and non- url safe characters). 例程的第一部分计算需要替换的字符数(空格和非URL安全字符)。 The second part of the routine allocates a new buffer and performs replacements: 例程的第二部分分配一个新的缓冲区并执行替换:

  1. Url Safe Characters are kept as is: az AZ 0-9 ()*-._! Url安全字符保持原样: az AZ 0-9 ()*-._!
  2. Spaces are converted to plus signs 空格转换为加号
  3. All other characters are converted to %HH 所有其他字符都转换为%HH

RFC1738 states (emphasis mine): RFC1738声明(强调我的):

Thus, only alphanumerics, the special characters "$-_.+!*'(),", and 因此,只有字母数字,特殊字符“$ -_。+!*'(),”和
reserved characters used for their reserved purposes may be used 可以使用用于其保留目的的保留字符
unencoded within a URL. 在URL中未编码。

On the other hand, characters that are not required to be encoded 另一方面,不需要编码的字符
(including alphanumerics) may be encoded within the scheme-specific (包括字母数字)可以在特定方案内编码
part of a URL, as long as they are not being used for a reserved URL的一部分,只要它们不用于保留
purpose. 目的。

The set of Url Safe Characters allowed by UrlEncode is a subset of the special characters defined in RFC1738. UrlEncode允许的Url安全字符集是RFC1738中定义的特殊字符的子集。 Namely, the characters $, are missing and will be encoded by UrlEncode even when the spec says they are safe. 也就是说,字符$,缺失并且将由UrlEncode编码,即使规范说它们是安全的。 Since they may be used unencoded (and not must ), it still meets the spec to encode them (and the second paragraph states that explicitly). 由于它们可能是未编码的(而不是必须的 ),它仍然符合编码它们的规范(第二段明确说明)。

With respect to line breaks, if the input has a CR LF sequence then that will be escaped %0D%0A . 关于换行符,如果输入具有CR LF序列,那么将转义%0D%0A However, if the input has only LF then that will be escaped %0A (so there is no normalization of line breaks in this routine). 但是,如果输入只有LF那么将转义%0A (因此在此例程中没有换行标准化)。

Bottom Line: It meets the specification while additionally encoding $, , and the caller is responsible for providing suitably normalized line breaks in the input. 底线:它符合规范,同时另外编码$,并且调用者负责在输入中提供适当规范化的换行符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM