简体   繁体   English

比较 UTF-8 字符串

[英]Comparing UTF-8 String

I'm trying to compare two string lets say Émilie and Zoey.我正在尝试比较两个字符串,比如 Émilie 和 Zoey。 Well 'E' comes before 'Z' but on the ASCII chart Z comes before É so a normal if ( str1 > str2 ) Won't work.好吧,'E' 出现在 'Z' 之前,但在 ASCII 图表上,Z 出现在 É 之前,所以正常的if ( str1 > str2 )不起作用。

I tried with if (strcmp(str1,str2) > 0) still don't work.我试过if (strcmp(str1,str2) > 0)仍然不起作用。 So i'm looking into a native way to compare string with UTF-8 characters.所以我正在寻找一种本地方式来比较字符串与 UTF-8 字符。

IMPORTANT重要的

This answer is meant for situations where it's not possible to run/install the 'intl' extension, and only sorts strings by replacing accented characters to non-accented characters .此答案适用于无法运行/安装 'intl' 扩展并且只能通过将重音字符替换为非重音字符来对字符串进行排序的情况。 To sort accented characters according to a specific locale, using a Collator is a better approach -- see the other answer to this question for more information.要根据特定区域设置对重音字符进行排序,使用Collat​​or是一种更好的方法——有关更多信息,请参阅此问题的其他答案。

Sorting by non-accented characters in PHP 5.2在 PHP 5.2 中按非重音字符排序

You may try converting both strings to ASCII using iconv() and the //TRANSLIT option to get rid of accented characters;您可以尝试使用 iconv() 和 //TRANSLIT 选项将两个字符串转换为 ASCII 以去除重音字符;

$str1 = iconv('utf-8', 'ascii//TRANSLIT', $str1);

Then do the comparison然后做对比

See the documentation here:请参阅此处的文档:

http://www.php.net/manual/en/function.iconv.php http://www.php.net/manual/en/function.iconv.php

[updated, in response to @Esailija's remark] I overlooked the problem of //TRANSLIT translating accented characters in unexpected ways. [更新,回应@Esailija 的评论] 我忽略了 //TRANSLIT 以意想不到的方式翻译重音字符的问题。 This problem is mentioned in this question: php iconv translit for removing accents: not working as excepted?这个问题在这个问题中提到: php iconv translit for removes Acces: not working as exception?

To make the 'iconv()' approach work, I've added a code sample below that strips all non-word characters from the resulting string using preg_replace().为了使 'iconv()' 方法起作用,我在下面添加了一个代码示例,使用 preg_replace() 从结果字符串中去除所有非单词字符。

<?php

setLocale(LC_ALL, 'fr_FR');

$names = array(
   'Zoey and another (word) ',
   'Émilie and another word',
   'Amber',
);


$converted = array();

foreach($names as $name) {
    $converted[] = preg_replace('#[^\w\s]+#', '', iconv('UTF-8', 'ASCII//TRANSLIT', $name));
}

sort($converted);

echo '<pre>'; print_r($converted);

// Array
// (
//     [0] => Amber
//     [1] => Emilie and another word
//     [2] => Zoey and another word 
// )

There is no native way to do this, however a PECL extension: http://php.net/manual/de/class.collator.php没有本地方法可以做到这一点,但是有一个 PECL 扩展: http : //php.net/manual/de/class.collat​​or.php

$c = new Collator('fr_FR');
if ($c->compare('Émily', 'Zoey') < 0) { echo 'Émily < Zoey'; }

I recomend to use the usort function, to avoid modifying the values, and still compare them correctly.我建议使用usort函数,以避免修改值,并且仍然正确比较它们。

Example:例子:

<?php

setLocale(LC_ALL, 'fr_FR');

$names = [
   'Zoey and another (word) ',
   'Émilie and another word',
   'Amber'
];

function compare(string $a, string $b) {
    $a = preg_replace('#[^\w\s]+#', '', iconv('utf-8', 'ascii//TRANSLIT', $a));
    $b = preg_replace('#[^\w\s]+#', '', iconv('utf-8', 'ascii//TRANSLIT', $b));

    return strcmp($a, $b);
}

usort($names, 'compare');

echo '<pre>';
print_r($names);
echo '</pre>';

with result:结果:

Array
(
    [0] => Amber
    [1] => Émilie and another word
    [2] => Zoey and another (word) 
)

Here's something that works for me although I'm not sure if it will serve the purpose of comparing the special characters other languages have.这是对我有用的东西,尽管我不确定它是否可以用于比较其他语言的特殊字符。

I'm just using the mb_strpos function and looking at the results.我只是在使用mb_strpos函数并查看结果。 I guess that would be as close as you can get to a native comparing of UTF8 strings:我想这将尽可能接近 UTF8 字符串的本机比较:

if (mb_strpos(mb_strtolower($search_in), $search_for) !== false) {
    //do stuff
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM