简体   繁体   English

有简单的方法可以从PHP中的多字节字符串中获取字符吗?

[英]There are simple way to get a character from multibyte string in PHP?

This is my problem: My language (Portuguese) uses ISO-8859-1 char encoding! 这是我的问题:我的语言(葡萄牙语)使用ISO-8859-1 char编码! When I want access a character from a string like 'coração' (heart) I use: 当我想从“coração”(心)这样的字符串访问字符时,我使用:

mb_internal_encoding('ISO-8859-1');
$str = "coração";

$len = mb_strlen($str,'UTF-8');

for($i=0;$i<$len;++$i)
    echo mb_substr($str, $i, 1, 'UTF-8')."<br/>";

This produces: 这将产生:

c
o
r
a
ç
ã
o

This works fine... But my issue is if the use of mb_substr function is not fast as simple string normal access! 效果很好...但是我的问题是,如果mb_substr函数的使用不像简单的字符串正常访问那样快! But I want a simple way to do this.... like in normal string character access: echo $str[$pos].... It is possible? 但是我想要一种简单的方法来执行此操作。...就像在普通的字符串字符访问中一样:echo $ str [$ pos] ....可能吗?

mb_substr function is not fast as [...] like in normal string character access: echo $str[$pos].... It is possible? mb_substr函数的速度不像正常的字符串字符访问中那样快:echo $ str [$ pos] ....可能吗?

No. 没有。

The multibyte functions have to check every character to determine how many bytes (1 to 4 in UTF-8) it occupies. 多字节函数必须检查每个字符以确定它占用多少字节(UTF-8中为1到4个字节)。 There you immediately have the reason why character indexing ( $a[n] ) won't work: you don't know what byte(s) you need to get the n th character before you've read all characters before that one. 在那里,您立即就有字符索引( $a[n] )无法工作的原因:在读取第n个字符之前,您不知道需要获取哪个字节。

To speed things up a bit, you can look at the answers here: How to iterate UTF-8 string in PHP? 为了加快速度,您可以在此处查看答案: 如何在PHP中迭代UTF-8字符串?

However, since you use ISO 8859-1 or Latin-1, you don't have to use the mb_ functions at all, since in that encoding all characters are encoded in one byte . 但是,由于您使用的是ISO 8859-1或Latin-1,因此根本不需要使用mb_函数,因为在该编码中,所有字符都被编码为一个字节

Try: 尝试:

preg_match_all( "/./u", $str, $ar_chars );
print_r( $ar_chars ); 

... Sort of. ... 有点。 If you use a fixed-width encoding (ISO 8859-*, UCS-2, or UTF-32, or UTF-16 within the BMP) then you can use a fixed multiplier for character accesses. 如果您使用固定宽度编码(BMP中的ISO 8859-*,UCS-2或UTF-32或UTF-16),则可以使用固定乘数进行字符访问。 You will still need to make multiple accesses for the multiple-byte encodings though. 但是,您仍然需要对多字节编码进行多次访问。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM