简体   繁体   中英

How to sort strings in JavaScript by code point values?

I need to sort an array of strings, where elements are compared lexicographically as sequences of code point values, so that, for example, "Z" < "a" < "\?" < " " < "💩" .

  1. Is there a more efficient way of comparing strings, other than manually iterating over both of them and comparing the corresponding code points?
  2. What if it is guaranteed that the strings don't have any surrogate code points (but may have surrogate pairs, so " " < "💩" should still hold)? Is there a more efficient procedure for this special case?

Note: there are many answers on StackOverflow explaining how to sort strings, but they either use the localeCompare order or the order defined by JavaScript comparison operators (which compare strings as sequences of UTF-16 code units). I am not interested in either of those.

How to sort strings in JavaScript by code point values?


It appears to be a surprisingly difficult problem. Here's a Proof Of Concept (POC) implementation:

'use strict';

function compareCodePoints(s1, s2) {
    const len = Math.min(s1.length, s2.length);
    let i = 0;
    for (const c1 of s1) {
        if (i >= len) {
            break;
        }
        const cp1 = s1.codePointAt(i);
        const cp2 = s2.codePointAt(i);
        const order = cp1 - cp2;
        if (order !== 0) {
            return order;
        }
        i++;
        if (cp1 > 0xFFFF) {
            i++;
        }
    }
    return s1.length - s2.length;
}

let s =[];
let s1 = "abc𞸁z";
let s2 = "abc𞸂z";

s = [s1, s2];
console.log(s);
s.sort(compareCodePoints);
console.log(s);

console.log()

s = [s2, s1];
console.log(s);
s.sort(compareCodePoints);
console.log(s);

console.log()

s1 = "a";
s2 = "";

console.log([s1, s2]);
console.log(compareCodePoints(s1, s2));
console.log([s2, s1]);
console.log(compareCodePoints(s2, s1));

$ node codepoint.poc.js
[ 'abc𞸁z', 'abc𞸂z' ]
[ 'abc𞸁z', 'abc𞸂z' ]

[ 'abc𞸂z', 'abc𞸁z' ]
[ 'abc𞸁z', 'abc𞸂z' ]

[ 'a', '' ]
1
[ '', 'a' ]
-1
$

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM