简体   繁体   English

为什么Normalizer :: normalize(PHP)不起作用?

[英]Why Normalizer::normalize (PHP) doesn't work?

I'm trying to normalize strings with characters like 'áéíóú' to 'aeiou' to simplify searches. 我正在尝试使用'áéíóú'等字符对字符串进行规范化,以简化搜索。

Following the response to this question I should use the Normalizer class to do it. 在回答这个问题后,我应该使用Normalizer类来完成它。

The problem is that the normalize function does nothing. 问题是normalize函数什么都不做。 For example, that code: 例如,该代码:

<?php echo 'Pérez, NFC: ' . normalizer_normalize('Pérez', Normalizer::NFC) 
    . ' NFD: ' .normalizer_normalize('Pérez', Normalizer::NFD)
    . ' NFKC: ' .normalizer_normalize('Pérez', Normalizer::NFKC) 
    . ' NFKD: ' .normalizer_normalize('Pérez', Normalizer::NFKD)?>
<br/>
<?php echo 'aáàä, êëéè,' 
    . ' FORM_C: ' . normalizer_normalize('aáàä, êëéè', Normalizer::FORM_C )
    . ' FORM_D: ' .normalizer_normalize('aáàä, êëéè', Normalizer::FORM_D)
    . ' FORM_KC: ' .normalizer_normalize('aáàä, êëéè', Normalizer::FORM_KC)
    . ' FORM_KD: ' .normalizer_normalize('aáàä, êëéè', Normalizer::FORM_KD)?>

shows: 说明:

Pérez, NFC: Pérez NFD: Pérez NFKC: Pérez NFKD: Pérez
aáàä, êëéè, FORM_C: aáàä, êëéè FORM_D: aáàä, êëéè FORM_KC: aáàä, êëéè FORM_KD: aáàä, êëéè 

What is supposed normalize must do? 什么应该正常化必须做?

---EDITED--- --- EDITED ---

It is stranger. 这很奇怪。 When copy and paste the result from web browser, while in editor and original page I can see: 从Web浏览器复制并粘贴结果时,在编辑器和原始页面中我可以看到:

FORM_D: aáàä, êëéè

in the stackoverflow question page I can see (just in Code Sample mode): 在stackoverflow问题页面中,我可以看到(仅在代码示例模式下):

FORM_D: aáàä, êëéè

Found on this page : (the linked document has different wording, the old one never exists anymore) 此页面上找到:(链接的文档有不同的措辞,旧的不再存在)

Unicode and internationalization is a large topic, but you should know at least one more important thing. Unicode和国际化是一个很大的主题,但你至少应该知道一件更重要的事情。 For historical reasons, Unicode allows alternative representations of some characters. 由于历史原因,Unicode允许某些字符的替代表示。 For example, á can be written either as one precomposed character á with the Unicode code point U+00E1 or as a decomposed sequence of the letter a (U+0061) combined with the accent ´ (U+0301). 例如,á可以写成一个预编译字符á与Unicode代码点U + 00E1或作为字母a(U + 0061)与重音符号(U + 0301)组合的分解序列。 For purposes of comparison and sorting, two such representations should be taken as equal. 出于比较和排序的目的,两个这样的表示应该被视为相同。 To solve this, the intl library provides the Normalizer class. 为了解决这个问题,intl库提供了Normalizer类。 This class in turn provides the normalize() method, which you can use to convert a string to a normalized composed or decomposed form. 该类又提供了normalize()方法,您可以使用该方法将字符串转换为规范化的组合或分解形式。 Your application should consistently transform all strings to one or the other form before performing comparisons. 在执行比较之前,您的应用程序应始终将所有字符串转换为一种或另一种形式。

echo Normalizer::normalize("a´", Normalizer::FORM_C); // á  
echo Normalizer::normalize("á", Normalizer::FORM_D); // a´

So eliminating accents (and similar) is not the purpose of Normalizer . 因此,消除重音(和类似)并不是Normalizer的目的。

What you are looking for is iconv("UTF-8", "ISO-8859-1//TRANSLIT", $text) . 您正在寻找的是iconv("UTF-8", "ISO-8859-1//TRANSLIT", $text)

http://php.net/manual/function.iconv.php http://php.net/manual/function.iconv.php

Be careful with LC_* settings! 小心LC_*设置! Depending on the setting the transliteration might change. 根据设置,音译可能会发生变化。

Normalizer with FORM_D can split the diacritics out from the base characters, then preg_replace can eliminate the diacritics: 带有FORM_D Normalizer FORM_D可以将变音符号从基本字符中分离出来,然后preg_replace可以消除变音符号:

$string = 'áéíóú';
echo preg_replace('/[\x{0300}-\x{036f}]/u', "", Normalizer::normalize($string , Normalizer::FORM_D));
//aeiou

For a function that actually removes the accents, the best that I have found so far is in the wordpress core: https://core.trac.wordpress.org/browser/trunk/src/wp-includes/formatting.php#L1127 remove_accents($string) 对于实际删除重音的函数,到目前为止我发现的最好的是wordpress核心: https//core.trac.wordpress.org/browser/trunk/src/wp-includes/formatting.php#L1127 remove_accents($字符串)

(Note I have filed a bug against it in order for them to take an updated version that I provided which documents each character and how it is tranlsted. so it may change in the future) (注意我已经提交了一个错误,以便他们采用我提供的更新版本,每个角色的文档以及如何进行转换。因此将来可能会更改)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM