简体   繁体   English

Javascript的toUpperCase()语言是否安全?

[英]Is Javascript's toUpperCase() language safe?

Will Javascript's String prototype method toUpperCase() deliver the naturally expected result in every UTF-8-supported language/charset? Javascript的String原型方法toUpperCase()是否会在每个支持UTF-8的语言/字符集中提供自然预期的结果?

I've tried simplified chinese, south korean, tamil, japanese and cyrillic and the results seemed reasonable so far. 我已经尝试过简体中文,韩文,泰米尔语,日语和西里尔语,到目前为止结果似乎合理。 Can I rely on the method being language-safe? 我可以依赖语言安全的方法吗?

Example: 例:

  "イロハニホヘトチリヌルヲワカヨタレソツネナラムウヰノオクヤマケフコエテアサキユメミシヱヒモセス".toUpperCase()
> "イロハニホヘトチリヌルヲワカヨタレソツネナラムウヰノオクヤマケフコエテアサキユメミシヱヒモセス"

Edit: As @Quentin pointed out, there also is a String.prototype.toLocaleUpperCase() which is probably even "safer" to use, but I also have to support IE 8 and above, as well as Webkit-based browsers. 编辑:正如@Quentin指出的那样,还有一个String.prototype.toLocaleUpperCase()甚至可能“更安全”使用,但我还必须支持IE 8及更高版本,以及基于Webkit的浏览器。 Since it is part of ECMAScript 3 Standard, it should be available on all those browsers, right? 由于它是ECMAScript 3标准的一部分,它应该可以在所有这些浏览器上使用,对吧?

Does anyone know of any cases where using it delivers naturally unexpected results? 有谁知道使用它会产生自然意外结果的任何情况?

What do you expect? 你能指望什么?

JavaScript's toUpperCase() method is supposed to use the "locale invariant upper case mapping" as defined by the Unicode standard. JavaScript的toUpperCase()方法应该使用Unicode标准定义的“locale不变大写映射”。 So, basically, "i".toUpperCase() is supposed to be I in all cases. 所以,基本上, "i".toUpperCase()在所有情况下都应该是I In cases where the locale invariant upper case mapping consists of multiple letters, most browsers will not upper case them correctly, for example "ß".toUpperCase() is often not SS . 在区域设置不变大写映射由多个字母组成的情况下,大多数浏览器不会正确地大写它们,例如"ß".toUpperCase()通常不是SS

Also, there are locales that have different uppercase rules than the rest of the world, the most notable example being Turkish, where the uppercase version of i is İ (and vice versa) and the lowercase version of I is ı (and vice versa). 此外,有一些语言环境具有与世界其他地方不同的大写规则,最值得注意的例子是土耳其语,其中i的大写版本是İ (反之亦然),而小写版本的Iı (反之亦然) 。

If you want that behaviour, you will need a browser that is set to Turkish locale, and you have to use the toLocaleUpperCase() method. 如果您想要这种行为,则需要一个设置为土耳其语语言环境的浏览器,并且必须使用toLocaleUpperCase()方法。

Also note that some writing systems have a third case, "title case", which is applied to the first letter of a word when you want to "capitalize" it. 另请注意,某些书写系统有第三种情况,即“标题案例”,当您想要“大写”它时,它应用于单词的第一个字母。 This is also defined by the Unicode standard (for example, the Title case of the ligature nj is Nj while the upper case is NJ ), but (as far as I know) not available to JavaScript. 这也是由Unicode标准定义的(例如,连字的njNj而大写是NJ ),但是(据我所知)JavaScript不可用。 Therefore if you try to capitalize a word using substring and toUpperCase , expect it to be wrong in rare cases. 因此,如果您尝试使用substringtoUpperCase来大写单词,那么在极少数情况下会出现错误。

Yes. 是。 From the spec : 规格

[Returns] a String where each character is either the Unicode uppercase equivalent of the corresponding character of [the input] or the actual corresponding character of [the input] if no Unicode uppercase equivalent exists. [返回]一个String,其中每个字符是[输入]的相应字符的Unicode大写等效项,如果不存在Unicode大写等效项,则为[输入]的实际对应字符。

For the purposes of this operation, the 16-bit code units of the Strings are treated as code points in the Unicode Basic Multilingual Plane. 出于此操作的目的,字符串的16位代码单元被视为Unicode基本多语言平面中的代码点。 Surrogate code points are directly transferred from [input to output] without any mapping. 代理代码点直接从[输入到输出]传输,没有任何映射。

The result must be derived according to the case mappings in the Unicode character database (this explicitly includes not only the UnicodeData.txt file, but also the SpecialCasings.txt file that accompanies it in Unicode 2.1.8 and later). 必须根据Unicode字符数据库中的大小写映射派生结果(这显然不仅包括UnicodeData.txt文件,还包括Unicode 2.1.8及更高版本中随附的SpecialCasings.txt文件)。

So while this might not exactly match your languages expectations (as many languages use the same characters but not necessarily in the same way), it does certainly deliver the naturally expected result as specified in the Unicode Character Database. 因此,虽然这可能与您的语言期望不完全匹配(因为许多语言使用相同的字符但不一定以相同的方式),但它确实提供了Unicode字符数据库中指定的自然期望的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM