简体   繁体   English

URL的PATH段中是否允许使用“&”符号?

[英]Is the “&” symbol allowed in the PATH segment of an URL?

Is the "&" symbol allowed in the PATH segment of an URL or should be escaped? URL的PATH段中是否允许使用“&”符号,或者应该将其转义?

According to nu w3c validator ( https://validator.w3.org/nu/ ) I got: 根据nu w3c验证器( https://validator.w3.org/nu/ ),我得到了:

Error: & did not start a character reference. (& probably should have been escaped as &.)
At line 407, column 52
<a href="/Bags-&-Purses/c/wome

However if I try to encode the URL via Java URI class I got all spaces and etc encoded but not the & symbol. 但是,如果我尝试通过Java URI类对URL进行编码,则会得到所有空格等的编码,而不会对&符号进行编码。

URI u = new URI(request.getScheme(), null,
                            request.getServerName(), request.getServerPort(),
                            request.getContextPath() + url,
                            query, null);
u.toURL().toString();

Where url string was : /Bags-&-Purses/c/womens-accessories-bags 网址字符串为:/ Bags-&-Purses / c / womens-accessories-bags

The result is : https://localhost:8112/storefront/Bags-&-Purses/c/womens-accessories-bags - not encoded 结果是: https:// localhost:8112 / storefront / Bags-&-Purses / c / womens-accessories-bags-未编码

The question is why the & is not escaped.. is this valid ? 问题是为什么&不能逃脱..这有效吗? I guess it should be escaped with %26 but it looks it doesn't get escaped. 我想应该用%26对其进行转义,但看起来它不会被转义。

&, while a reserved character, seems to be aa valid character for the path segment in an URI. &(虽然是保留字符)似乎是URI中路径段的有效字符。 If you look at the grammar given for the path segment in RFC3986, section 3.3 , & is allowed as part of the sub-delims group: 如果您查看RFC3986第3.3节中为路径段给出的语法,则&可以作为sub-delims组的一部分:

  path          = path-abempty    ; begins with "/" or is empty
                / path-absolute   ; begins with "/" but not "//"
                / path-noscheme   ; begins with a non-colon segment
                / path-rootless   ; begins with a segment
                / path-empty      ; zero characters

  path-abempty  = *( "/" segment )
  path-absolute = "/" [ segment-nz *( "/" segment ) ]
  path-noscheme = segment-nz-nc *( "/" segment )
  path-rootless = segment-nz *( "/" segment )
  path-empty    = 0<pchar>

  segment       = *pchar
  segment-nz    = 1*pchar
  segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" )
                ; non-zero-length segment without any colon ":"

  pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

(...)

  reserved    = gen-delims / sub-delims

  gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

  sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                / "*" / "+" / "," / ";" / "="

While you're asking about URLs and not the more general URIs, as far as I'm able to tell, an URL does not pose extra restrictions to the path segment. 据我所知,在询问URL而不是询问更通用的URI时,URL不会对路径段构成额外的限制。 Section 2.2 of the same RFC then goes on to state that reserved characters should be percent-encoded, unless they're specifically allowed in that component. 然后,同一RFC的2.2节继续指出,除非保留字符在该组件中特别允许,否则保留字符应进行百分比编码。 But for this case, all the characters in sub-delims group (& included) seem to be specifically allowed in the path segment, as per the grammar above. 但是对于这种情况,按照上面的语法,似乎在路径段中特别允许了sub-delims组(包括&)中的所有字符。

However, the issue you're having here is not related to the URL itself, but with its textual representation when included in an HTML document. 但是,这里遇到的问题与URL本身无关,而与包含在HTML文档中的文本表示形式有关。 An ampersand cannot show up alone in HTML and must always be encoded. “&”号不能单独显示在HTML中,必须始终进行编码。 Related question: Do I really need to encode '&' as '&amp;'? 相关问题: 我真的需要将“&”编码为“&amp;”吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM