[英]There are simple way to get a character from multibyte string in PHP?
This is my problem: My language (Portuguese) uses ISO-8859-1 char encoding! 这是我的问题:我的语言(葡萄牙语)使用ISO-8859-1 char编码! When I want access a character from a string like 'coração' (heart) I use: 当我想从“coração”(心)这样的字符串访问字符时,我使用:
mb_internal_encoding('ISO-8859-1');
$str = "coração";
$len = mb_strlen($str,'UTF-8');
for($i=0;$i<$len;++$i)
echo mb_substr($str, $i, 1, 'UTF-8')."<br/>";
This produces: 这将产生:
c o r a ç ã o
This works fine... But my issue is if the use of mb_substr function is not fast as simple string normal access! 效果很好...但是我的问题是,如果mb_substr函数的使用不像简单的字符串正常访问那样快! But I want a simple way to do this.... like in normal string character access: echo $str[$pos].... It is possible? 但是我想要一种简单的方法来执行此操作。...就像在普通的字符串字符访问中一样:echo $ str [$ pos] ....可能吗?
mb_substr function is not fast as [...] like in normal string character access: echo $str[$pos].... It is possible? mb_substr函数的速度不像正常的字符串字符访问中那样快:echo $ str [$ pos] ....可能吗?
No. 没有。
The multibyte functions have to check every character to determine how many bytes (1 to 4 in UTF-8) it occupies. 多字节函数必须检查每个字符以确定它占用多少字节(UTF-8中为1到4个字节)。 There you immediately have the reason why character indexing ( $a[n]
) won't work: you don't know what byte(s) you need to get the n th character before you've read all characters before that one. 在那里,您立即就有字符索引( $a[n]
)无法工作的原因:在读取第n个字符之前,您不知道需要获取哪个字节。
To speed things up a bit, you can look at the answers here: How to iterate UTF-8 string in PHP? 为了加快速度,您可以在此处查看答案: 如何在PHP中迭代UTF-8字符串?
However, since you use ISO 8859-1 or Latin-1, you don't have to use the mb_
functions at all, since in that encoding all characters are encoded in one byte . 但是,由于您使用的是ISO 8859-1或Latin-1,因此根本不需要使用mb_
函数,因为在该编码中,所有字符都被编码为一个字节 。
Try: 尝试:
preg_match_all( "/./u", $str, $ar_chars );
print_r( $ar_chars );
... Sort of. ... 有点。 If you use a fixed-width encoding (ISO 8859-*, UCS-2, or UTF-32, or UTF-16 within the BMP) then you can use a fixed multiplier for character accesses. 如果您使用固定宽度编码(BMP中的ISO 8859-*,UCS-2或UTF-32或UTF-16),则可以使用固定乘数进行字符访问。 You will still need to make multiple accesses for the multiple-byte encodings though. 但是,您仍然需要对多字节编码进行多次访问。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.