在PHP 5.3中替换UTF-8字符

Question

Why doesn't this test case work? 为什么这个测试用例不起作用？

<?php
// cards with cyrillic inidices and suits in UTF-8 encoding
$a = array('7♠', 'Д♠', 'К♠', '8♦', 'В♦', 'Д♦', '10♣', '10♥', 'В♥', 'Т♥');
foreach ($a as $card) {
        $suit = substr($card, -1);

        $card = preg_replace('/(\d+)♥/', '<span class="red">$1&hearts;</span>', $card);
        $card = preg_replace('/(\d+)♦/', '<span class="red">$1&diams;</span>', $card);
        $card = preg_replace('/(\d+)♠/', '<span class="black">$1&spades;</span>', $card);
        $card = preg_replace('/(\d+)♣/', '<span class="black">$1&clubs;</span>', $card);

        printf("suit: %s, html: %s\n", $suit, $card);
}
?>

Output: 输出：

suit: ▒, html: <span class="black">7&spades;</span>
suit: ▒, html: Д♠
suit: ▒, html: К♠
suit: ▒, html: <span class="red">8&diams;</span>
suit: ▒, html: В♦
suit: ▒, html: Д♦
suit: ▒, html: <span class="black">10&clubs;</span>
suit: ▒, html: <span class="red">10&hearts;</span>
suit: ▒, html: В♥
suit: ▒, html: Т♥

Ie I'm struggling with 2 problems in my PHP-script: 即我在PHP脚本中遇到2个问题：

Why isn't the last UTF-8 character extracted correctly? 为什么不能正确提取最后一个UTF-8字符？
Why only first suit is being replaced by preg_replace ? 为什么只有第一套服装被preg_replace取代？

Using PHP 5.3.3, PostgreSQL 8.4.12 holding UTF-8 JSON (with Russian text and card suits) on CentOS 6.2. 使用PHP 5.3.3，PostgreSQL 8.4.12在CentOS 6.2上持有UTF-8 JSON（带俄文和卡套装）。

If 1. is a bug in PHP 5.3.3, then is there a good workaround? 如果1.是PHP 5.3.3中的错误，那么有一个很好的解决方法吗？ (I don't want to upgrade the stock package). （我不想升级库存包）。

UPDATE: 更新：

<?php
$a = array('7♠', 'Д♠', 'К♠', '8♦', 'В♦', 'Д♦', '10♣', '10♥', 'В♥', 'Т♥');
foreach ($a as $card) {
        $suit = mb_substr($card, -1, 1, 'UTF-8');

        $card = preg_replace('/(\d+)♥/u', '<span class="red">$1&hearts;</span>', $card);
        $card = preg_replace('/(\d+)♦/u', '<span class="red">$1&diams;</span>', $card);
        $card = preg_replace('/(\d+)♠/u', '<span class="black">$1&spades;</span>', $card);
        $card = preg_replace('/(\d+)♣/u', '<span class="black">$1&clubs;</span>', $card);

        printf("suit: %s, html: %s\n", $suit, $card);
}
?>

The new output: 新输出：

suit: ♠, html: <span class="black">7&spades;</span>
suit: ♠, html: Д♠
suit: ♠, html: К♠
suit: ♦, html: <span class="red">8&diams;</span>
suit: ♦, html: В♦
suit: ♦, html: Д♦
suit: ♣, html: <span class="black">10&clubs;</span>
suit: ♥, html: <span class="red">10&hearts;</span>
suit: ♥, html: В♥

Answer 1

substr is one of the naïve PHP core functions which assumes 1 byte = 1 character. substr是一个天真的PHP核心函数之一，它假设1个字节= 1个字符。 substr(..., -1) extracts the last byte from the string. substr(..., -1)从字符串中提取最后一个字节。 "♠" is longer than one byte though. “♠”虽然长于一个字节。 You should use mb_substr($card, -1, 1, 'UTF-8') instead. 您应该使用mb_substr($card, -1, 1, 'UTF-8') 。

You need to add the u (PCRE_UTF8) modifier to the regular expression to make it deal with UTF-8 encoded expressions and strings correctly: 您需要将u （PCRE_UTF8）修饰符添加到正则表达式，以使其正确处理UTF-8编码的表达式和字符串：

preg_replace('/(\d+)♥/u', ...

在PHP 5.3中替换UTF-8字符

问题描述

1 个解决方案

解决方案1
10 已采纳 2012-06-16 11:51:20

在PHP 5.3中替换UTF-8字符

问题描述

1 个解决方案

解决方案1 10 已采纳 2012-06-16 11:51:20

解决方案1
10 已采纳 2012-06-16 11:51:20