简体   繁体   中英

How to strlen of a multi-language string

I want to get strlen() of Shift-jis and Utf-8, then compare them. A string could be mixed "ああ12345678sdfdszzz". I tried to use strlen but it generates the different results. mb_strlen also doesn't help because this is a mixed string.

For example:

ああ12345678 >> strlen() = 24 chars
ああああああああああああああああ >> strlen() = 48 chars
ああああああああああああああああああ >> strlen() = 54 chars

It seems to be there is no rule. So what is the best way to calculate strlen and compare them in multilanguage ?

strlen does only count the bytes and thus is only useful for single-byte character encodings ; use mb_strlen for multi-byte character encodings that can count the actual characters instead.

I would write a function to check from where to where a particular encoding exsist.

Then I would split the string into encodings, perform the mb_strlen and sum up the sizes afterwords. Then repeat on the second string and compare.

I guess you understand my point ;)

PS: Use mb_detect_encoding to detect encoding

mb_detect_encoding (see the comments for further ideas by the php community)

$field = $_POST['field'];
$field_length = mb_strlen($field,'utf-8');

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM