简体   繁体   English

当char包含表情符号时如何比较char?

[英]How to compare char's when char's contain emoji's?

General Overview: 总体概述:

I have a list of names, each name is a string& . 我有一个名称列表,每个名称都是string& A common action one wants to do with a list of strings is sort the strings in alphabetical order. 一个想要处理的字符串列表的常见操作是按字母顺序对字符串进行排序。

One way to do this is to convert both strings to the same case, start with the first character in each string, and evaluate the characters to see which comes first in the alphabet along the lines of if (char1 > char2) , repeating until the two characters being compared are not equal or until reaching the last character in the shorter string. 一种方法是将两个字符串都转换为相同的大小写,从每个字符串中的第一个字符开始,然后对这些字符进行评估,以查看它们沿if (char1 > char2)的行在字母表中排在第一位,重复直到被比较的两个字符不相等,或者直到到达较短字符串中的最后一个字符为止。

Emoji characters always evaluate to ... interesting ... char values, which, when using a sorting algorithm like one described above, emoji char's are always sorted to come before alphanumeric characters. 表情符号字符始终评估为有趣的 char值,当使用上述排序算法时,表情符号char总是排序字母数字字符之前

Goal: Sorting emoji strings, or strings that merely start with an emoji, before or after the purely alphanumeric strings is arbitrary. 目标:在纯字母数字字符串之前或之后对表情符号字符串或仅以表情符号开头的字符串进行排序是任意的。 I'd like to be able to control where in alphabetical order emoji characters/strings are sorted: the choice of after 'Z'/'z' or before 'A'/'a'. 我希望能够控制按字母顺序排列的表情符号字符/字符串的位置:“ Z” /“ z”之后还是 “ A” /“ a”之前的选择。

(I'm not saying I'd like to control where they are sorted to the point of placing them between other arbitrary characters like 'p' and 'q', and I'm not saying my goal is to control how emoji's are ordered when compared to other emojis, just to be clear.) (我并不是说我想控制它们的排序位置,以便将它们放置在其他任意字符(例如'p'和'q')之间,也不是说我的目标是控制表情符号的排序方式与其他表情符号进行比较时,请务必清楚。)

Some code to demonstrate: 一些代码来演示:

bool compareStringsIgnoreCase(std::string& str1, std::string& str2)
{
   int i = 0;
   while (i < str1.length() && i < str2.length())
   {
      char firstChar = tolower(first[i]);
      char secondChar = tolower(second[i]);

      int firstCharAsInt = firstChar;
      int secondCharAsInt = secondChar;

      if (firstCharAsInt < secondCharAsInt)
           return true;
      else if (firstCharAsInt > secondCharAsInt)
           return false;
      i++;
   }
   return (str1.length() < str2.length());
}

If using str1 = "Abc" and str2 = 👍 , then when i = 0 , the other values are as follows: firstChar = 'a' 如果使用str1 = "Abc"str2 = 👍 ,则当i = 0 ,其他值如下: firstChar = 'a'

secondChar = '\\xf0'

firstCharAsInt = 97

secondCharAsInt = -16

With these values, it makes sense that firstCharAsInt > secondCharAsInt , and so the function returns true , and the emoji string is sorted to be before the "Abc" string. 使用这些值, firstCharAsInt > secondCharAsInt是有意义的,因此该函数返回true ,并且表情符号字符串被排序为在“ Abc”字符串之前。 Again, what I'd like to be able to do is have emojis sorted after alphanumeric characters--the question is, how? 同样,我想做的是将表情符号按字母数字字符排序-问题是如何?

I tried out a handful of emojis, and their "char as int" values are always negative. 我尝试了几种表情符号,它们的“ char as int”值始终为负。 Are emojis unique from other char 's in this way? 表情符号是否以此方式与其他char独特? If so that could be a simple and easy check that can identify them to place them after other char's. 如果是这样,那可能是一个简单容易的检查,可以识别出它们是否将它们放置在其他字符之后。 Open to other approaches as well. 也欢迎其他方法。

Thanks 谢谢

Emojis are Unicode characters, so on the assumption that your strings are encoded as UTF-8 then the easiest way to compare them is to convert them to a std::wstring . 表情符号是Unicode字符,因此假设您的字符串编码为UTF-8,那么比较它们的最简单方法是将它们转换为std::wstring You can do this using std::codecvt . 您可以使用std::codecvt执行此操作。 Although this is deprecated in C++17 there is no current convenient replacement. 尽管C ++ 17中不推荐使用此方法,但是当前没有方便的替代方法。

So, one can do: 因此,可以做到:

#include <string>
#include <codecvt>
#include <locale>
#include <cctype>

std::wstring widen (const std::string &s)
{
    std::wstring_convert <std::codecvt_utf8 <wchar_t>, wchar_t> convert;
    return convert.from_bytes (s);
}

void lower_case_string (std::wstring &ws)
{
    for (auto &ch : ws)
        ch = tolower (ch);
}

// Return true if s1 == s2 (UTF-8, case insensitive)
bool compare (const std::string &s1, const std::string &s2)
{
    std::wstring ws1 = widen (s1);
    lower_case_string (ws1);
    std::wstring ws2 = widen (s2);
    lower_case_string (ws2);
    return ws1 == ws2;
}

Although please note that the comparison function one would use for sorting would be s1 < s2 . 尽管请注意,用于排序的比较函数将是s1 < s2

Live demo 现场演示

To answer my proposed approach: emoji's are not unique in that their "char as int" values are negative. 为了回答我提出的方法:表情符号不是唯一的,因为它们的“ char as int”值是负数。

Other symbols, like '§' for instance, also evaluate to negative, in this case -62, and so are sorted before alphanumeric characters. 其他符号,例如“§”,也取负数,在这种情况下为-62,因此在字母数字字符之前排序。

Checking for these negative values and targeting them to affect their sort order will work to change the sort order of emojis , but it will change the sort order of other extraneous characters as well which makes this approach an imperfect solution to the original goal. 检查这些负值并将它们作为目标以影响其排序顺序将可以更改表情符号的排序顺序 ,但同时也会更改其他无关字符的排序顺序,这使得此方法对于原始目标而言并非完美的解决方案。

A simple & clean way to do this would be to cast the "char as int" values to unsigned ints . 一种简单而干净的方法是将“ char as int”值转换为unsigned ints The negative values would, after two's complement, be casted to high positive values and thus sort after the other positive values. 在二进制补码之后,负值将转换为高正值,从而在其他正值之后进行排序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM