I'm running into an unexpected character replacement problem. The character code is 8217, '
.
I've tried escaping the character with a slash, but it didn't make a difference.
php > $a = preg_replace('/([.,\'"’:?!])<\/a>/', '</a>\1', 'letter">Evolution’</a> </li>');
php > echo($a);
// => letter">Evolution/a> </li>
// Just to show that it works if the character is different
php > $a = preg_replace('/([.,\'"’:?!])<\/a>/', '</a>\1', 'letter">Evolution"</a> </li>');
php > echo($a);
letter">Evolution</a>" </li>
I would expect it to output
letter">Evolution</a>' </li>
instead of
letter">Evolution/a> </li>
By default, pcre (the php regex engine) considers your pattern as a succession of single byte encoded characters. So when you write [']
you obtain a character class with the three bytes on which THE RIGHT SINGLE QUOTATION MARK (U+2019) is encoded, ie: \\xE2
, \\x80
, \\x99
.
In other words, writting "/[']/"
in this default mode is like writting "/[\\xE2\\x80\\x99]/"
or "/[\\x80\\xE2\\x99]/"
or "/[\\x99\\xE2\\x80]/"
etc., the regex engine doesn't see a sequence of bytes that represents the character '
but only three bytes.
This is the reason why you obtain a strange result, because [.,\\'"':?!]
will only match the last byte of '
so \\x99
.
To solve the problem, you have to force the regex engine to read your pattern as an UTF-8 encoded string. You can do that with one of this ways:
preg_replace('~(*UTF)([.,\\'"':?!])</a>~', '</a>\\1', 'letter">Evolution'</a> </li>');
preg_replace('~([.,\\'"':?!])</a>~u', '</a>\\1', 'letter">Evolution'</a> </li>');
This time the three bytes \\xE2\\x80\\x99
are seen as an atomic sequence for the character '
.
Notice: (*UTF)
is only for the reading of the pattern but the u
modifier does more things: it extends shorthand character classes (like \\s
, \\w
, \\d
) to unicode characters and checks if the subject string is utf-8 encoded.
Just add unicode flag to the regex:
$a = preg_replace('/([.,\'"’:?!])<\/a>/u', '</a>\1', 'letter">Evolution’</a> </li>');
# here ___^
echo($a);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.