[英]Getting wrong encoding when trying to replace cyrillic symbols
I have a problem with my string. 我的琴弦有问题。 After the for loop all I get some other symbols instead of my exact cyrillic letters.
在for循环之后,我得到了其他一些符号,而不是确切的西里尔字母。 The goal is to change cyrillic letters: ąčęėįšųūž into this: a1, c2, e1, e2, i1, s2, u1, u2, z2.
我们的目标是将西里尔字母更改为:a1,c2,e1,e2,i1,s2,u1,u2,z2。 I have came up with tihs:
我想到了:
$ltSymbolsArray = array(
'a1' => 'ą',
'c2' => 'č',
'e1' => 'ę',
'e2' => 'ė',
'i1' => 'į',
's2' => 'š',
'u1' => 'ų',
'u2' => 'ū',
'z2' => 'ž'
);
$string = 'ąsąžadcę';
for ($i = 0; $i < strlen($string); $i++) {
foreach ($ltSymbolsArray as $key => $value) {
if ($string[$i] == $value) {
$string[$i] = $key;
}
}
}
It looks like a simple solution, but I can't handle the encoding. 它看起来像一个简单的解决方案,但是我无法处理编码。 Encoding is a mystery for me so I would really appreciate any help on this problem.
编码对我来说还是个谜,因此,我非常感谢您对此问题的任何帮助。
You can't simply iterate over a unicode string and expect, that each iteration will receive a full character, if a single character really goes over more than one byte. 您不能简单地遍历unicode字符串并期望,如果单个字符确实超过一个字节,则每次迭代都将接收完整字符。
Use preg_split
in combination with the unicode modifier to split your string into valid unicode characters. 结合使用
preg_split
和unicode修饰符,可以将字符串拆分为有效的unicode字符。 Then use the result of this to replace the characters in the original string. 然后使用此结果替换原始字符串中的字符。
You could also use one of the multibyte regex functions, such as mb_ereg_replace
您还可以使用多字节正则表达式功能之一,例如
mb_ereg_replace
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.