I want to get strlen() of Shift-jis and Utf-8, then compare them. A string could be mixed "ああ12345678sdfdszzz". I tried to use strlen but it generates the different results. mb_strlen also doesn't help because this is a mixed string.
For example:
ああ12345678 >> strlen() = 24 chars
ああああああああああああああああ >> strlen() = 48 chars
ああああああああああああああああああ >> strlen() = 54 chars
It seems to be there is no rule. So what is the best way to calculate strlen and compare them in multilanguage ?
strlen
does only count the bytes and thus is only useful for single-byte character encodings ; use mb_strlen
for multi-byte character encodings that can count the actual characters instead.
I would write a function to check from where to where a particular encoding exsist.
Then I would split the string into encodings, perform the mb_strlen and sum up the sizes afterwords. Then repeat on the second string and compare.
I guess you understand my point ;)
PS: Use mb_detect_encoding to detect encoding
mb_detect_encoding (see the comments for further ideas by the php community)
$field = $_POST['field'];
$field_length = mb_strlen($field,'utf-8');
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.