有简单的方法可以从PHP中的多字节字符串中获取字符吗？

Question

This is my problem: My language (Portuguese) uses ISO-8859-1 char encoding! 这是我的问题：我的语言（葡萄牙语）使用ISO-8859-1 char编码！ When I want access a character from a string like 'coração' (heart) I use: 当我想从“coração”（心）这样的字符串访问字符时，我使用：

mb_internal_encoding('ISO-8859-1');
$str = "coração";

$len = mb_strlen($str,'UTF-8');

for($i=0;$i<$len;++$i)
    echo mb_substr($str, $i, 1, 'UTF-8')."<br/>";

This produces: 这将产生：

c
o
r
a
ç
ã
o

This works fine... But my issue is if the use of mb_substr function is not fast as simple string normal access! 效果很好...但是我的问题是，如果mb_substr函数的使用不像简单的字符串正常访问那样快！ But I want a simple way to do this.... like in normal string character access: echo $str[$pos].... It is possible? 但是我想要一种简单的方法来执行此操作。...就像在普通的字符串字符访问中一样：echo $ str [$ pos] ....可能吗？

Answer 1

mb_substr function is not fast as [...] like in normal string character access: echo $str[$pos].... It is possible? mb_substr函数的速度不像正常的字符串字符访问中那样快：echo $ str [$ pos] ....可能吗？

No. 没有。

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) 每个软件开发人员绝对，肯定必须绝对了解Unicode和字符集（无借口！）
Premature optimization 过早的优化

The multibyte functions have to check every character to determine how many bytes (1 to 4 in UTF-8) it occupies. 多字节函数必须检查每个字符以确定它占用多少字节（UTF-8中为1到4个字节）。 There you immediately have the reason why character indexing ( $a[n] ) won't work: you don't know what byte(s) you need to get the n th character before you've read all characters before that one. 在那里，您立即就有字符索引（ $a[n] ）无法工作的原因：在读取第n个字符之前，您不知道需要获取哪个字节。

To speed things up a bit, you can look at the answers here: How to iterate UTF-8 string in PHP? 为了加快速度，您可以在此处查看答案：如何在PHP中迭代UTF-8字符串？

However, since you use ISO 8859-1 or Latin-1, you don't have to use the mb_ functions at all, since in that encoding all characters are encoded in one byte . 但是，由于您使用的是ISO 8859-1或Latin-1，因此根本不需要使用mb_函数，因为在该编码中，所有字符都被编码为一个字节。

Answer 2

Try: 尝试：

preg_match_all( "/./u", $str, $ar_chars );
print_r( $ar_chars );

Answer 3

... Sort of. ... 有点。 If you use a fixed-width encoding (ISO 8859-*, UCS-2, or UTF-32, or UTF-16 within the BMP) then you can use a fixed multiplier for character accesses. 如果您使用固定宽度编码（BMP中的ISO 8859-*，UCS-2或UTF-32或UTF-16），则可以使用固定乘数进行字符访问。 You will still need to make multiple accesses for the multiple-byte encodings though. 但是，您仍然需要对多字节编码进行多次访问。

有简单的方法可以从PHP中的多字节字符串中获取字符吗？

问题描述

3 个解决方案

解决方案1
4 2012-05-02 11:24:05

解决方案2
1 2012-05-02 11:34:18

解决方案3
0 2012-04-28 05:10:40

有简单的方法可以从PHP中的多字节字符串中获取字符吗？

问题描述

3 个解决方案

解决方案1 4 2012-05-02 11:24:05

解决方案2 1 2012-05-02 11:34:18

解决方案3 0 2012-04-28 05:10:40

解决方案1
4 2012-05-02 11:24:05

解决方案2
1 2012-05-02 11:34:18

解决方案3
0 2012-04-28 05:10:40