简体   繁体   English

如何在Javascript中比较Unicode字符串?

[英]How to compare Unicode strings in Javascript?

When I wrote in JavaScript "Ł" > "Z" it returns true . 当我用JavaScript "Ł" > "Z"写时,它返回true In Unicode order it should be of course false . 在Unicode顺序中,它当然应该是false How to fix this? 如何解决这个问题? My site is using UTF-8. 我的网站使用的是UTF-8。

You can use Intl.Collator or String.prototype.localeCompare , introduced by ECMAScript Internationalization API : 您可以使用ECMAScript Internationalization API引入的Intl.CollatorString.prototype.localeCompare

"Ł".localeCompare("Z", "pl");              // -1
new Intl.Collator("pl").compare("Ł","Z");  // -1

-1 means that Ł comes before Z , like you want. -1意味着ŁZ之前出现,就像你想要的那样。

Note it only works on latest browsers, though. 请注意,它仅适用于最新的浏览器。

Here is an example for the french alphabet that could help you for a custom sort: 以下是法语字母表的示例,可以帮助您进行自定义排序:

var alpha = function(alphabet, dir, caseSensitive){
  return function(a, b){
    var pos = 0,
      min = Math.min(a.length, b.length);
    dir = dir || 1;
    caseSensitive = caseSensitive || false;
    if(!caseSensitive){
      a = a.toLowerCase();
      b = b.toLowerCase();
    }
    while(a.charAt(pos) === b.charAt(pos) && pos < min){ pos++; }
    return alphabet.indexOf(a.charAt(pos)) > alphabet.indexOf(b.charAt(pos)) ?
      dir:-dir;
  };
};

To use it on an array of strings a : 要使用字符串数组上a

a.sort(
  alpha('ABCDEFGHIJKLMNOPQRSTUVWXYZaàâäbcçdeéèêëfghiïîjklmnñoôöpqrstuûüvwxyÿz')
);

Add 1 or -1 as the second parameter of alpha() to sort ascending or descending. 添加1-1作为alpha()的第二个参数以升序或降序排序。
Add true as the 3rd parameter to sort case sensitive. 添加true作为第3个参数以区分区分大小写。

You may need to add numbers and special chars to the alphabet list 您可能需要在字母表列表中添加数字和特殊字符

You may be able to build your own sorting function using localeCompare() that - at least according to the MDC article on the topic - should sort things correctly. 您可以使用localeCompare()构建自己的排序函数 - 至少根据有关该主题MDC文章 - 应该正确排序。

If that doesn't work out, here is an interesting SO question where the OP employs string replacement to build a "brute-force" sorting mechanism. 如果这不成功,这里有一个有趣的SO问题 ,其中OP使用字符串替换来构建“强力”排序机制。

Also in that question, the OP shows how to build a custom textExtract function for the jQuery tablesorter plugin that does locale-aware sorting - maybe also worth a look. 同样在那个问题中,OP展示了如何为jQuery tablesorter插件构建自定义textExtract函数 ,该插件可以进行区域设置感知排序 - 也许值得一看。

Edit: As a totally far-out idea - I have no idea whether this is feasible at all, especially because of performance concerns - if you are working with PHP/mySQL on the back-end anyway, I would like to mention the possibility of sending an Ajax query to a mySQL instance to have it sorted there. 编辑:作为一个非常遥远的想法 - 我根本不知道这是否可行,特别是因为性能问题 - 如果你在后端使用PHP / mySQL,我想提一下可能性将Ajax查询发送到mySQL实例以使其在那里排序。 mySQL is great at sorting locale aware data, because you can force sorting operations into a specific collation using eg ORDER BY xyz COLLATE utf8_polish_ci , COLLATE utf8_german_ci .... those collations would take care of all sorting woes at once. mySQL非常ORDER BY xyz COLLATE utf8_polish_ci对区域设置感知数据进行排序,因为您可以使用例如ORDER BY xyz COLLATE utf8_polish_ciCOLLATE utf8_german_ci强制排序操作进入特定的排序COLLATE utf8_german_ci ....这些排序将立即处理所有排序问题。

Mic's code improved for non-mentioned chars: Mic的代码针对未提到的字符进行了改进:

var alpha = function(alphabet, dir, caseSensitive){
  dir = dir || 1;
  function compareLetters(a, b) {
    var ia = alphabet.indexOf(a);
    var ib = alphabet.indexOf(b);
    if(ia === -1 || ib === -1) {
      if(ib !== -1)
        return a > 'a';
      if(ia !== -1)
        return 'a' > b;
      return a > b;
    }
    return ia > ib;
  }
  return function(a, b){
    var pos = 0;
    var min = Math.min(a.length, b.length);
    caseSensitive = caseSensitive || false;
    if(!caseSensitive){
      a = a.toLowerCase();
      b = b.toLowerCase();
    }
    while(a.charAt(pos) === b.charAt(pos) && pos < min){ pos++; }
    return compareLetters(a.charAt(pos), b.charAt(pos)) ? dir:-dir;
  };
};

function assert(bCondition, sErrorMessage) {
      if (!bCondition) {
          throw new Error(sErrorMessage);
      }
}

assert(alpha("bac")("a", "b") === 1, "b is first than a");
assert(alpha("abc")("ac", "a") === 1, "shorter string is first than longer string");
assert(alpha("abc")("1abc", "0abc") === 1, "non-mentioned chars are compared as normal");
assert(alpha("abc")("0abc", "1abc") === -1, "non-mentioned chars are compared as normal [2]");
assert(alpha("abc")("0abc", "bbc") === -1, "non-mentioned chars are compared with mentioned chars in special way");
assert(alpha("abc")("zabc", "abc") === 1, "non-mentioned chars are compared with mentioned chars in special way [2]");

You have to keep two sortkey strings. 你必须保留两个sortkey字符串。 One is for primary order, where German ä=a (primary a->a) and French é=e (primary sortkey e->e) and one for secondary order, where ä comes after a (translating a->azzzz in secondary key) or é comes after e (secondary key e->ezzzz). 一个用于初级订单,其中德语ä= a(主要a-> a)和法语é= e(主要排序键e-> e)和一个用于次要顺序,其中ä来自a(在次要中翻译a-> azzzz)键)或é来自e(二级键e-> ezzzz)。 Especially in Czech some letters are variations of a letter (áéí…) whereas others stand in their full right in the list (ABCČD…GHChI…RŘSŠT…). 特别是在捷克语中,有些字母是字母的变体(áéí......),而其他字母则完全位于列表中(ABCČD... GHChI ...RŘSŠT...)。 Plus the problem to consider digraphs a single letters (primary ch->hzzzz). 另外还有一个问题是考虑单个字母(主要ch-> hzzzz)。 No trivial problem, and there should be a solution within JS. 没有小问题,JS中应该有一个解决方案。

Funny, I have to think about that problem and finished searching here, because it came in mind, that I can use my own javascript module. 有趣的是,我必须考虑这个问题,并在这里完成搜索,因为它记住了,我可以使用自己的javascript模块。 I wrote a module to generate a clean URL, therefor I have to translitate the input string... ( http://pid.github.io/speakingurl/ ) 我写了一个模块来生成一个干净的URL,因此我必须对输入字符串进行转换...( http://pid.github.io/speakingurl/

var mySlug = require('speakingurl').createSlug({
    maintainCase: true,
    separator: " "
});

var input = "Schöner Titel läßt grüßen!? Bel été !";
var result;

slug = mySlug(input);
console.log(result); // Output: "Schoener Titel laesst gruessen bel ete"

Now you can sort with this results. 现在您可以使用此结果进行排序。 You can ex. 你可以前。 store the original titel in the field "title" and the field for sorting in "title_sort" with the result of mySlug. 将原始标题存储在字段“title”中,并将字段存储在“title_sort”中,并带有mySlug的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM