[英]How do I make toLowerCase() and toUpperCase() consistent across browsers
Are there JavaScript polyfill implementations of String.toLowerCase() and String.toUpperCase(), or other methods in JavaScript that can work with Unicode characters and are consistent across browsers? 是否存在String.toLowerCase()和String.toUpperCase()的JavaScript polyfill实现,或者JavaScript中可以使用Unicode字符并且跨浏览器一致的其他方法?
Performing the following will give difference results in browsers, or even between browser versions (Eg FireFox 54 vs 55): 执行以下操作将在浏览器中或甚至浏览器版本之间产生不同的结果(例如FireFox 54与55):
document.write(String.fromCodePoint(223).normalize("NFKC").toLowerCase().toUpperCase().toLowerCase())
In Firefox 55 it gives you ss
, in Firefox 54 it gives you ß
. 在Firefox 55中它为你提供了ss
,在Firefox 54中它为你提供了ß
。
Generally this is fine, and mechanisms such as Locales handle a lot of the cases you'd want; 通常这很好,Locales等机制可以处理你想要的很多情况; however, when you need consistent behavior across platforms such as talking to BaaS systems like google-cloud-firestore it can greatly simplify interactions where you're essentially processing internal data on the client. 但是,当您需要跨平台的一致行为时,例如与google-cloud-firestore等BaaS系统交谈,它可以极大地简化您实际处理客户端内部数据的交互。
Note that this issue only seems to affect outdated versions of Firefox, so unless you explicitly need to support those old versions, you could choose to just not bother at all. 请注意,此问题似乎只会影响过时的Firefox版本,因此除非您明确需要支持这些旧版本,否则您可以选择不打扰。 The behavior for your example is the same in all modern browsers (since the change in Firefox). 您的示例的行为在所有现代浏览器中都是相同的(因为Firefox中的更改)。 This can be verified using jsvu + eshost : 这可以使用jsvu + eshost验证:
$ jsvu # Update installed JavaScript engine binaries to the latest version.
$ eshost -e '"\xDF".normalize("NFKC").toLowerCase().toUpperCase().toLowerCase()'
#### Chakra
ss
#### V8 --harmony
ss
#### JavaScriptCore
ss
#### V8
ss
#### SpiderMonkey
ss
#### xs
ss
But you asked how to solve this problem, so let's continue. 但你问如何解决这个问题,让我们继续。
Step 4 of https://tc39.github.io/ecma262/#sec-string.prototype.tolowercase states: https://tc39.github.io/ecma262/#sec-string.prototype.tolowercase的第4步说明:
Let
cuList
be a List where the elements are the result oftoLowercase(cpList)
, according to the Unicode Default Case Conversion algorithm. 根据Unicode默认大小写转换算法,让cuList
为List,其中元素是toLowercase(cpList)
的结果。
This Unicode Default Case Conversion algorithm is specified in section 3.13 Default Case Algorithms of the Unicode standard . 此Unicode默认大小写转换算法 在Unicode标准的3.13默认大小写算法中指定。
The full case mappings for Unicode characters are obtained by using the mappings from
SpecialCasing.txt
plus the mappings fromUnicodeData.txt
, excluding any of the latter mappings that would conflict. Unicode字符的完整大小写映射是通过使用来自SpecialCasing.txt
的映射加上UnicodeData.txt
的映射来获得的,不包括任何后面会发生冲突的映射。 Any character that does not have a mapping in these files is considered to map to itself. 任何在这些文件中没有映射的字符都被视为映射到自身。[…] [...]
The following rules specify the default case conversion operations for Unicode strings. 以下规则指定Unicode字符串的默认大小写转换操作。 These rules use the full case conversion operations,
Uppercase_Mapping(C)
,Lowercase_Mapping(C)
, andTitlecase_Mapping(C)
, as well as the context-dependent mappings based on the casing context, as specified in Table 3-17. 这些规则使用完整的情况下转换操作,Uppercase_Mapping(C)
Lowercase_Mapping(C)
和Titlecase_Mapping(C)
以及基于所述壳体上下文依赖于上下文的映射,如表3-17中指定。For a string
X
: 对于字符串X
:
- R1
toUppercase(X)
: Map each characterC
inX
toUppercase_Mapping(C)
. R1toUppercase(X)
每个字符映射C
在X
至Uppercase_Mapping(C)
- R2
toLowercase(X)
: Map each characterC
inX
toLowercase_Mapping(C)
. R2toLowercase(X)
每个字符映射C
在X
至Lowercase_Mapping(C)
Here's an example from SpecialCasing.txt
, with my annotation added below: 以下是来自SpecialCasing.txt
的示例,其中添加了我的注释:
00DF ; 00DF ; 0053 0073; 0053 0053; # LATIN SMALL LETTER SHARP S
<code>; <lower>; <title> ; <upper> ; (<condition_list>;)? # <comment>
This line says that U+00DF ( 'ß'
) lowercases to U+00DF ( ß
) and uppercases to U+0053 U+0053 ( SS
). 该线表示U + 00DF( 'ß'
)小写为U + 00DF( ß
),大写小写为U + 0053 U + 0053( SS
)。
Here's an example from UnicodeData.txt
, with my annotation added below: 这是UnicodeData.txt
的一个示例,我的注释在下面添加:
0041 ; LATIN CAPITAL LETTER A; Lu;0;L;;;;;N;;;; 0061 ;
<code>; <name> ; <ignore> ; <lower>; <upper>
This line says that U+0041 ( 'A'
) lowercases to U+0061 ( 'a'
). 该行表示U + 0041( 'A'
)小写为U + 0061( 'a'
)。 It doesn't have an explicit uppercase mapping, meaning it uppercases to itself. 它没有明确的大写映射,这意味着它是自身的大写。
Here's another example from UnicodeData.txt
: 这是UnicodeData.txt
的另一个例子:
0061 ; LATIN SMALL LETTER A; Ll;0;L;;;;;N;; ;0041; ; 0041
<code>; <name> ; <ignore> ; <lower>; <upper>
This line says that U+0061 ( 'a'
) uppercases to U+0041 ( 'A'
). 该行表示U + 0061( 'a'
)上限为U + 0041( 'A'
)。 It doesn't have an explicit lowercase mapping, meaning it lowercases to itself. 它没有明确的小写映射,这意味着它会降低自身的范围。
You could write a script that parses these two files, reads each line following these examples, and builds lowercase/uppercase mappings. 您可以编写一个解析这两个文件的脚本,按照这些示例读取每一行,并构建小写/大写映射。 You could then turn those mappings into a small JavaScript library that provides spec-compliant toLowerCase
/ toUpperCase
functionality. 然后,您可以将这些映射转换为一个小型JavaScript库,该库提供符合规范的toLowerCase
/ toUpperCase
功能。
This seems like a lot of work. 这似乎很多工作。 Depending on the old behavior in Firefox and what exactly changed (?) you could probably limit the work to just the special mappings in SpecialCasing.txt
. 根据旧的行为在Firefox和究竟是什么改变了(?),你很可能限制了工作, 只是在特殊的映射SpecialCasing.txt
。 (I'm making this assumption that only the special casings changed in Firefox 55, based on the example you provided.) (我假设在Firefox 55中只根据您提供的示例更改了特殊外壳。)
// Instead of…
function normalize(string) {
const normalized = string.normalize('NFKC');
const lowercased = normalized.toLowerCase();
return lowercased;
}
// …one could do something like:
function lowerCaseSpecialCases(string) {
// TODO: replace all SpecialCasing.txt characters with their lowercase
// mapping.
return string.replace(/TODO/g, fn);
}
function normalize(string) {
const normalized = string.normalize('NFKC');
const fixed = lowerCaseSpecialCases(normalized); // Workaround for old Firefox 54 behavior.
const lowercased = fixed.toLowerCase();
return lowercased;
}
I wrote a script that parses SpecialCasing.txt
and generates a JS library that implements the lowerCaseSpecialCases
functionality mentioned above (as toLower
) as well as toUpper
. 我编写了一个脚本来解析SpecialCasing.txt
并生成一个JS库,该库实现了上面提到的lowerCaseSpecialCases
功能(如toLower
)以及toUpper
。 Here it is: https://gist.github.com/mathiasbynens/a37e3f3138069729aa434ea90eea4a3c Depending on your exact use case, you might not need the toUpper
and its corresponding regex and map at all. 这是: https : toUpper
根据您的确切用例,您可能根本不需要toUpper
及其相应的正则表达式和映射。 Here's the full generated library: 这是完整生成的库:
const reToLower = /[\u0130\u1F88-\u1F8F\u1F98-\u1F9F\u1FA8-\u1FAF\u1FBC\u1FCC\u1FFC]/g;
const toLowerMap = new Map([
['\u0130', 'i\u0307'],
['\u1F88', '\u1F80'],
['\u1F89', '\u1F81'],
['\u1F8A', '\u1F82'],
['\u1F8B', '\u1F83'],
['\u1F8C', '\u1F84'],
['\u1F8D', '\u1F85'],
['\u1F8E', '\u1F86'],
['\u1F8F', '\u1F87'],
['\u1F98', '\u1F90'],
['\u1F99', '\u1F91'],
['\u1F9A', '\u1F92'],
['\u1F9B', '\u1F93'],
['\u1F9C', '\u1F94'],
['\u1F9D', '\u1F95'],
['\u1F9E', '\u1F96'],
['\u1F9F', '\u1F97'],
['\u1FA8', '\u1FA0'],
['\u1FA9', '\u1FA1'],
['\u1FAA', '\u1FA2'],
['\u1FAB', '\u1FA3'],
['\u1FAC', '\u1FA4'],
['\u1FAD', '\u1FA5'],
['\u1FAE', '\u1FA6'],
['\u1FAF', '\u1FA7'],
['\u1FBC', '\u1FB3'],
['\u1FCC', '\u1FC3'],
['\u1FFC', '\u1FF3']
]);
const toLower = (string) => string.replace(reToLower, (match) => toLowerMap.get(match));
const reToUpper = /[\xDF\u0149\u01F0\u0390\u03B0\u0587\u1E96-\u1E9A\u1F50\u1F52\u1F54\u1F56\u1F80-\u1FAF\u1FB2-\u1FB4\u1FB6\u1FB7\u1FBC\u1FC2-\u1FC4\u1FC6\u1FC7\u1FCC\u1FD2\u1FD3\u1FD6\u1FD7\u1FE2-\u1FE4\u1FE6\u1FE7\u1FF2-\u1FF4\u1FF6\u1FF7\u1FFC\uFB00-\uFB06\uFB13-\uFB17]/g;
const toUpperMap = new Map([
['\xDF', 'SS'],
['\uFB00', 'FF'],
['\uFB01', 'FI'],
['\uFB02', 'FL'],
['\uFB03', 'FFI'],
['\uFB04', 'FFL'],
['\uFB05', 'ST'],
['\uFB06', 'ST'],
['\u0587', '\u0535\u0552'],
['\uFB13', '\u0544\u0546'],
['\uFB14', '\u0544\u0535'],
['\uFB15', '\u0544\u053B'],
['\uFB16', '\u054E\u0546'],
['\uFB17', '\u0544\u053D'],
['\u0149', '\u02BCN'],
['\u0390', '\u0399\u0308\u0301'],
['\u03B0', '\u03A5\u0308\u0301'],
['\u01F0', 'J\u030C'],
['\u1E96', 'H\u0331'],
['\u1E97', 'T\u0308'],
['\u1E98', 'W\u030A'],
['\u1E99', 'Y\u030A'],
['\u1E9A', 'A\u02BE'],
['\u1F50', '\u03A5\u0313'],
['\u1F52', '\u03A5\u0313\u0300'],
['\u1F54', '\u03A5\u0313\u0301'],
['\u1F56', '\u03A5\u0313\u0342'],
['\u1FB6', '\u0391\u0342'],
['\u1FC6', '\u0397\u0342'],
['\u1FD2', '\u0399\u0308\u0300'],
['\u1FD3', '\u0399\u0308\u0301'],
['\u1FD6', '\u0399\u0342'],
['\u1FD7', '\u0399\u0308\u0342'],
['\u1FE2', '\u03A5\u0308\u0300'],
['\u1FE3', '\u03A5\u0308\u0301'],
['\u1FE4', '\u03A1\u0313'],
['\u1FE6', '\u03A5\u0342'],
['\u1FE7', '\u03A5\u0308\u0342'],
['\u1FF6', '\u03A9\u0342'],
['\u1F80', '\u1F08\u0399'],
['\u1F81', '\u1F09\u0399'],
['\u1F82', '\u1F0A\u0399'],
['\u1F83', '\u1F0B\u0399'],
['\u1F84', '\u1F0C\u0399'],
['\u1F85', '\u1F0D\u0399'],
['\u1F86', '\u1F0E\u0399'],
['\u1F87', '\u1F0F\u0399'],
['\u1F88', '\u1F08\u0399'],
['\u1F89', '\u1F09\u0399'],
['\u1F8A', '\u1F0A\u0399'],
['\u1F8B', '\u1F0B\u0399'],
['\u1F8C', '\u1F0C\u0399'],
['\u1F8D', '\u1F0D\u0399'],
['\u1F8E', '\u1F0E\u0399'],
['\u1F8F', '\u1F0F\u0399'],
['\u1F90', '\u1F28\u0399'],
['\u1F91', '\u1F29\u0399'],
['\u1F92', '\u1F2A\u0399'],
['\u1F93', '\u1F2B\u0399'],
['\u1F94', '\u1F2C\u0399'],
['\u1F95', '\u1F2D\u0399'],
['\u1F96', '\u1F2E\u0399'],
['\u1F97', '\u1F2F\u0399'],
['\u1F98', '\u1F28\u0399'],
['\u1F99', '\u1F29\u0399'],
['\u1F9A', '\u1F2A\u0399'],
['\u1F9B', '\u1F2B\u0399'],
['\u1F9C', '\u1F2C\u0399'],
['\u1F9D', '\u1F2D\u0399'],
['\u1F9E', '\u1F2E\u0399'],
['\u1F9F', '\u1F2F\u0399'],
['\u1FA0', '\u1F68\u0399'],
['\u1FA1', '\u1F69\u0399'],
['\u1FA2', '\u1F6A\u0399'],
['\u1FA3', '\u1F6B\u0399'],
['\u1FA4', '\u1F6C\u0399'],
['\u1FA5', '\u1F6D\u0399'],
['\u1FA6', '\u1F6E\u0399'],
['\u1FA7', '\u1F6F\u0399'],
['\u1FA8', '\u1F68\u0399'],
['\u1FA9', '\u1F69\u0399'],
['\u1FAA', '\u1F6A\u0399'],
['\u1FAB', '\u1F6B\u0399'],
['\u1FAC', '\u1F6C\u0399'],
['\u1FAD', '\u1F6D\u0399'],
['\u1FAE', '\u1F6E\u0399'],
['\u1FAF', '\u1F6F\u0399'],
['\u1FB3', '\u0391\u0399'],
['\u1FBC', '\u0391\u0399'],
['\u1FC3', '\u0397\u0399'],
['\u1FCC', '\u0397\u0399'],
['\u1FF3', '\u03A9\u0399'],
['\u1FFC', '\u03A9\u0399'],
['\u1FB2', '\u1FBA\u0399'],
['\u1FB4', '\u0386\u0399'],
['\u1FC2', '\u1FCA\u0399'],
['\u1FC4', '\u0389\u0399'],
['\u1FF2', '\u1FFA\u0399'],
['\u1FF4', '\u038F\u0399'],
['\u1FB7', '\u0391\u0342\u0399'],
['\u1FC7', '\u0397\u0342\u0399'],
['\u1FF7', '\u03A9\u0342\u0399']
]);
const toUpper = (string) => string.replace(reToUpper, (match) => toUpperMap.get(match));
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.