简体   繁体   中英

Comparing two unicode strings in PHP

I am stuck in comparing two unicode strings in PHP which both contain the special char 'ö'. One string comes from $_GET , the other one is a filesystem's folder name ( scandir() ). Both strings seem to be equal to me, making a

var_dump($filter);
var_dump($tail . '/' . $k);

on them also shows their equality but with different string lenghts (?!):

string '/blöb' (length=7)
string '/blöb' (length=6)

My snippet comparing them looks as follows:

if($filter == ($tail . '/' . $k)) {
    /* ... */
}

What's going on here?

Additional information: $tail is an empty string:

string '' (length=0)

See here: http://en.wikipedia.org/wiki/Unicode_equivalence and use this: http://www.php.net/manual/en/class.normalizer.php

You probably have a decomposed character in the longer string, meaning an o and then a umlaut combining character which overlays the previous character.

The normalizer function will fix things like that.

As a side note you should always normalize your input if you are using it for equivalence (for example a username - you want to make sure two people don't choose the same username, even if the binary representation of the string happens to be different).

Can you try parsing them through utf8_encode() and checking them there? PHP doesn't support unicode and therefore advises to use utf8_encode/decode for some basic Unicode features.

http://php.net/manual/en/language.types.string.php

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM