简体   繁体   English

如何按代码点值对 JavaScript 中的字符串进行排序?

[英]How to sort strings in JavaScript by code point values?

I need to sort an array of strings, where elements are compared lexicographically as sequences of code point values, so that, for example, "Z" < "a" < "\?" < " " < "💩" .我需要对一个字符串数组进行排序,其中元素按字典顺序作为代码点值序列进行比较,例如, "Z" < "a" < "\?" < " " < "💩"

  1. Is there a more efficient way of comparing strings, other than manually iterating over both of them and comparing the corresponding code points?除了手动迭代它们并比较相应的代码点之外,是否有更有效的方法来比较字符串?
  2. What if it is guaranteed that the strings don't have any surrogate code points (but may have surrogate pairs, so " " < "💩" should still hold)?如果保证字符串没有任何代理代码点(但可能有代理对,所以" " < "💩"应该仍然成立)怎么办? Is there a more efficient procedure for this special case?对于这种特殊情况,是否有更有效的程序?

Note: there are many answers on StackOverflow explaining how to sort strings, but they either use the localeCompare order or the order defined by JavaScript comparison operators (which compare strings as sequences of UTF-16 code units).注意:StackOverflow 上有很多解释如何对字符串进行排序的答案,但它们要么使用localeCompare顺序,要么使用 JavaScript 比较运算符(将字符串作为 UTF-16 代码单元的序列进行比较)定义的顺序。 I am not interested in either of those.我对其中任何一个都不感兴趣。

How to sort strings in JavaScript by code point values?如何按代码点值对 JavaScript 中的字符串进行排序?


It appears to be a surprisingly difficult problem.这似乎是一个令人惊讶的难题。 Here's a Proof Of Concept (POC) implementation:这是概念验证 (POC) 实现:

'use strict';

function compareCodePoints(s1, s2) {
    const len = Math.min(s1.length, s2.length);
    let i = 0;
    for (const c1 of s1) {
        if (i >= len) {
            break;
        }
        const cp1 = s1.codePointAt(i);
        const cp2 = s2.codePointAt(i);
        const order = cp1 - cp2;
        if (order !== 0) {
            return order;
        }
        i++;
        if (cp1 > 0xFFFF) {
            i++;
        }
    }
    return s1.length - s2.length;
}

let s =[];
let s1 = "abc𞸁z";
let s2 = "abc𞸂z";

s = [s1, s2];
console.log(s);
s.sort(compareCodePoints);
console.log(s);

console.log()

s = [s2, s1];
console.log(s);
s.sort(compareCodePoints);
console.log(s);

console.log()

s1 = "a";
s2 = "";

console.log([s1, s2]);
console.log(compareCodePoints(s1, s2));
console.log([s2, s1]);
console.log(compareCodePoints(s2, s1));

$ node codepoint.poc.js
[ 'abc𞸁z', 'abc𞸂z' ]
[ 'abc𞸁z', 'abc𞸂z' ]

[ 'abc𞸂z', 'abc𞸁z' ]
[ 'abc𞸁z', 'abc𞸂z' ]

[ 'a', '' ]
1
[ '', 'a' ]
-1
$

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM