使用PHP的str_replace函数替换UTF-16编码的字符串中的低ASCII字符

Question

I have some PHP code that I use for text filtering. 我有一些用于文本过滤的PHP代码。 During filtering, some ASCII characters such as ampersand (&) and tilde (~) are temporarily converted to low ASCII characters (such as decimal code-points 4 and 5). 在过滤过程中，一些ASCII字符（例如与号（＆）和代字号（〜））被临时转换为低ASCII字符（例如十进制代码点4和5）。 Just before the final filtered output is generated, the conversion is reverted. 在生成最终的过滤输出之前，将还原转换。

$temp = str_replace(array('&', '~'), array("\x04", "\x05"), $input);
... some filtering code to work with $temp ...
$out = str_replace(array("\x04", "\x05"), array('&', '~'), $temp);

This works well with input text of character encodings that use 8-bit code units such as UTF-8 and ISO 8859-1. 这对于使用8位代码单元（例如UTF-8和ISO 8859-1）的字符编码的输入文本效果很好。 But I am not sure about input encoded in larger code units, such as UTF-16 or UTF-32. 但是我不确定以更大的代码单元（例如UTF-16或UTF-32）编码的输入。 Will the first conversion step mangle the well-formedness of the input text? 第一步转换会破坏输入文本的格式吗？ Will there be some conflict during the reversion step because of some pre-existing characters of the input? 由于某些先前存在的输入字符，在还原步骤期间是否会有一些冲突？ The PHP setup does not overload multi-byte string functions. PHP安装程序不会重载多字节字符串函数。

Can anyone comment? 谁能评论？ Thanks. 谢谢。

Answer 1

str_replace works fine, as long as all strings passed to it are in the same encoding. 只要传递给它的所有字符串都使用相同的编码，str_replace即可正常工作。 It just does a binary compare/replace of data, so the actual encoding doesn't really matter. 它只是对数据进行二进制比较/替换，因此实际编码并不重要。

That's why there's no mb_str_replace in this list . 这就是为什么此列表中没有mb_str_replace的原因。

使用PHP的str_replace函数替换UTF-16编码的字符串中的低ASCII字符

问题描述

1 个解决方案

解决方案1
1 2012-09-15 08:45:00

使用PHP的str_replace函数替换UTF-16编码的字符串中的低ASCII字符

问题描述

1 个解决方案

解决方案1 1 2012-09-15 08:45:00

解决方案1
1 2012-09-15 08:45:00