简体   繁体   English

尝试替换西里尔字母符号时编码错误

[英]Getting wrong encoding when trying to replace cyrillic symbols

I have a problem with my string. 我的琴弦有问题。 After the for loop all I get some other symbols instead of my exact cyrillic letters. 在for循环之后,我得到了其他一些符号,而不是确切的西里尔字母。 The goal is to change cyrillic letters: ąčęėįšųūž into this: a1, c2, e1, e2, i1, s2, u1, u2, z2. 我们的目标是将西里尔字母更改为:a1,c2,e1,e2,i1,s2,u1,u2,z2。 I have came up with tihs: 我想到了:

$ltSymbolsArray = array(
      'a1' => 'ą',
      'c2' => 'č',
      'e1' => 'ę',
      'e2' => 'ė',
      'i1' => 'į',
      's2' => 'š',
      'u1' => 'ų',
      'u2' => 'ū',
      'z2' => 'ž'
  );
  $string = 'ąsąžadcę';

  for ($i = 0; $i < strlen($string); $i++) {
    foreach ($ltSymbolsArray as $key => $value) {
      if ($string[$i] == $value) {
        $string[$i] = $key;
      }
    }
  }

It looks like a simple solution, but I can't handle the encoding. 它看起来像一个简单的解决方案,但是我无法处理编码。 Encoding is a mystery for me so I would really appreciate any help on this problem. 编码对我来说还是个谜,因此,我非常感谢您对此问题的任何帮助。

You can't simply iterate over a unicode string and expect, that each iteration will receive a full character, if a single character really goes over more than one byte. 您不能简单地遍历unicode字符串并期望,如果单个字符确实超过一个字节,则每次迭代都将接收完整字符。

Use preg_split in combination with the unicode modifier to split your string into valid unicode characters. 结合使用preg_split和unicode修饰符,可以将字符串拆分为有效的unicode字符。 Then use the result of this to replace the characters in the original string. 然后使用此结果替换原始字符串中的字符。

You could also use one of the multibyte regex functions, such as mb_ereg_replace 您还可以使用多字节正则表达式功能之一,例如mb_ereg_replace

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM