简体   繁体   English

IRI是否有效作为HTML属性值?

[英]Are IRIs valid as HTML attribute values?

Is it valid HTML to use IRIs containing non-ASCII characters as attribute values (eg for href attributes) instead of URIs? 使用包含非ASCII字符的IRI作为属性值(例如对于href属性)而不是URI,这是有效的HTML吗? Are there any differences among the HTML flavors (HTML and XHTML, 4 and 5)? HTML风格(HTML和XHTML,4和5)之间是否有任何差异? At least RFC 3986 seems to imply that it isn't. 至少RFC 3986似乎暗示它不是。

I realize that it would probably be safer (regarding older and IRI-unaware software) to use percent encoding, but I'm looking for a definitive answer with regards to the standard. 我意识到使用百分比编码可能更安全(关于旧的和不知道IRI的软件),但我正在寻找关于标准的确定答案。

So far, I've done some tests with the W3C validator , and unescaped unicode characters in URIs don't trigger any warnings or errors with HTML 4/5 and XHTML 4/5 doctypes (but of course the absence of error messages doesn't imply the absence of errors). 到目前为止,我已经使用W3C验证器完成了一些测试,并且URI中未转义的unicode字符不会触发HTML 4/5和XHTML 4/5文档类型的任何警告或错误(但当然没有错误消息也没有'意味着没有错误)。

At least chrome also supports raw UTF-8 IRIs, but percent-escapes them before firing an HTTP request. 至少chrome还支持原始的UTF-8 IRI,但在触发HTTP请求之前,它们会逃脱它们。 Also, my web server (lighttpd) seems to support UTF-8 characters in their percent-encoded as well as in unencoded form in an HTTP request. 此外,我的Web服务器(lighttpd)似乎在HTTP请求中以百分比编码和未编码形式支持UTF-8字符。

HTML 4.01 is straightforward enough. HTML 4.01非常简单。 Different attributes have different rules as to what they can contain, but if we're dealing with the href attribute on an <a> element, then the HTML 4 spec, section B.2.1 Non-ASCII characters in URI attribute values says: 不同的属性对它们可以包含的内容有不同的规则,但是如果我们在<a>元素上处理href属性,那么HTML 4规范,B.2.1节属性值中的非ASCII字符部分说:

... the following href value is illegal: ......以下href值是非法的:

<A href="http://foo.org/Håkon">...</A>

HTML5 is different. HTML5与众不同。 It says IRIs are valid providing they comply with some additional conditions. 它表示IRI是有效的,只要它们符合一些附加条件。

A URL is a valid URL if at least one of the following conditions holds: 如果至少满足下列条件之一,则URL是有效的URL:

  • The URL is a valid URI reference [RFC3986]. URL是有效的URI引用[RFC3986]。

  • The URL is a valid IRI reference and it has no query component. URL是有效的IRI引用,它没有查询组件。 [RFC3987] [RFC3987]

  • The URL is a valid IRI reference and its query component contains no unescaped non-ASCII characters. URL是有效的IRI引用,其查询组件不包含未转义的非ASCII字符。 [RFC3987] [RFC3987]

  • The URL is a valid IRI reference and the character encoding of the URL's Document is UTF-8 or a UTF-16 encoding. URL是有效的IRI引用,URL的Document的字符编码是UTF-8或UTF-16编码。 [RFC3987] [RFC3987]

XHTML 1.x follows the same rules as HTML 4.01. XHTML 1.x遵循与HTML 4.01相同的规则。

XHTML5 is the same as HTML5. XHTML5与HTML5相同。

When in doubt, read the official HTML specs for definitive answers. 如有疑问,请阅读官方HTML规范以获得明确的答案。

HTML 4 does not support IRIs at all. HTML 4根本不支持IRI。 They must be encoded as URIs per RFC 3987 Section 3.1 , or encode non-ASCII URI data as UTF-8 with percent encoding per HTML4 Section B.2.1 它们必须按照RFC 3987第3.1节编码为URI,或者将非ASCII URI数据编码为UTF-8,每个HTML4编码百分比。 第B.2.1节

HTML 5 supports both URIs and IRIs in all places where URLs are allowed, per HTML5 Section 2.6 . 根据HTML5第2.6节 ,HTML 5在允许URL的所有地方都支持URI和IRI。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM