Replacing low ASCII characters in UTF-16-encoded string using PHP's str_replace function

Question

I have some PHP code that I use for text filtering. During filtering, some ASCII characters such as ampersand (&) and tilde (~) are temporarily converted to low ASCII characters (such as decimal code-points 4 and 5). Just before the final filtered output is generated, the conversion is reverted.

$temp = str_replace(array('&', '~'), array("\x04", "\x05"), $input);
... some filtering code to work with $temp ...
$out = str_replace(array("\x04", "\x05"), array('&', '~'), $temp);

This works well with input text of character encodings that use 8-bit code units such as UTF-8 and ISO 8859-1. But I am not sure about input encoded in larger code units, such as UTF-16 or UTF-32. Will the first conversion step mangle the well-formedness of the input text? Will there be some conflict during the reversion step because of some pre-existing characters of the input? The PHP setup does not overload multi-byte string functions.

Can anyone comment? Thanks.

Answer 1

str_replace works fine, as long as all strings passed to it are in the same encoding. It just does a binary compare/replace of data, so the actual encoding doesn't really matter.

That's why there's no mb_str_replace in this list .

Replacing low ASCII characters in UTF-16-encoded string using PHP's str_replace function

Question

1 answers

solution1
1 2012-09-15 08:45:00

Replacing low ASCII characters in UTF-16-encoded string using PHP's str_replace function

Question

1 answers

solution1 1 2012-09-15 08:45:00

solution1
1 2012-09-15 08:45:00