简体   繁体   English

JSON Unicode转义序列 - 小写与否?

[英]JSON Unicode escape sequence - lowercase or not?

I was reading RFC 4627 and I can't figure out if the following is valid JSON or not. 我正在阅读RFC 4627,我无法弄清楚以下是否是有效的JSON。 Consider this minimalistic JSON text: 考虑这个简约的JSON文本:

["\u005c"]

The problem is the lowercase c . 问题是小写 c

According to the text of the RFC it is allowed: 根据RFC的文本 ,允许:

Any character may be escaped. 任何角色都可能被转义。 If the character is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lowercase letter u, followed by four hexadecimal digits that encode the character's code point. 如果字符在基本多语言平面(U + 0000到U + FFFF)中,那么它可以表示为六个字符的序列:反向固相,后跟小写字母u,后跟四个十六进制数字,编码角色的代码点。 The hexadecimal letters A though F can be upper or lowercase. 十六进制字母A到F可以是大写或小写。 So, for example, a string containing only a single reverse solidus character may be represented as "\\". 因此,例如,仅包含单个反向固相字符的字符串可以表示为“\\ u005C”。

(Emphasis mine) (强调我的)

The problem is that the RFC also contains the grammar for this: 问题是RFC还包含以下语法

char = unescaped /
       escape (
           %x22 /          ; "    quotation mark  U+0022
           %x5C /          ; \    reverse solidus U+005C
           %x2F /          ; /    solidus         U+002F
           %x62 /          ; b    backspace       U+0008
           %x66 /          ; f    form feed       U+000C
           %x6E /          ; n    line feed       U+000A
           %x72 /          ; r    carriage return U+000D
           %x74 /          ; t    tab             U+0009
           %x75 4HEXDIG )  ; uXXXX                U+XXXX

where HEXDIG is defined in referenced RFC 4234 as 其中HEXDIG在引用的RFC 4234中定义为

HEXDIG         =  DIGIT / "A" / "B" / "C" / "D" / "E" / "F"

which includes only uppercase letters. 其中只包含大写字母。

FWIW, from what I researched most JSON parsers accept both upper and lowercase letters. FWIW,从我研究的大多数JSON解析器接受大写和小写字母。

Question(s) : What is actually correct? 问题 :什么是正确的? Is there a contradiction and the grammar in the RFC should be fixed? 是否存在矛盾,RFC中的语法应该修复?

I think it's explained by this part of RFC 4234: 认为这是RFC 4234的这一部分解释的:

ABNF strings are case-insensitive and the character set for these strings is us-ascii. ABNF字符串不区分大小写,这些字符串的字符集是us-ascii。

Hence: 因此:

  rulename = "abc" 

and: 和:

  rulename = "aBc" 

will match "abc", "Abc", "aBc", "abC", "ABc", "aBC", "AbC", and "ABC". 将匹配“abc”,“abc”,“aBc”,“abC”,“ABc”,“aBC”,“AbC”和“ABC”。

On the other hand, the follow-on part is not terribly clear: 另一方面,后续部分并不十分清楚:

To specify a rule that IS case SENSITIVE, specify the characters individually. 要指定IS SENSITIVE的规则,请单独指定字符。

For example: 例如:

  rulename = %d97 %d98 %d99 

or 要么

  rulename = %d97.98.99 

In the case of the HEXDIG rule, they're individual characters to start with - but they're specified literally as "A" etc rather than %d41 , so I suspect that means they're case-insensitive. HEXDIG规则的情况下,它们是单个字符开始 - 但它们被字面上指定为"A"等而不是%d41 ,所以我怀疑这意味着它们不区分大小写。 It's not the clearest spec I've read :( 这不是我读过的最清晰的规范:(

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM