[英]How can I get the single bytes from a multibyte PHP string variable in a binary-safe way?
Let's say (for simplicity's sake) that I have a multibyte, UTF-8 encoded string variable with 3 letters (consisting of 4 bytes):假设(为简单起见)我有一个多字节、UTF-8 编码的字符串变量,包含 3 个字母(由 4 个字节组成):
$original = 'Fön';
Since it's UTF-8, the bytes' hex values are (excluding the BOM):由于它是 UTF-8,字节的十六进制值是(不包括 BOM):
46 C3 B6 6E
As the $original
variable is user-defined, I will need to hande two things:由于
$original
变量是用户定义的,我需要处理两件事:
I would tend to use strlen()
to handle "1.", and access the $original
variable's bytes with a simple `$original[$byteposition]
like this:我倾向于使用
strlen()
来处理“1.”,并使用简单的`$original[$byteposition]
访问$original
变量的字节,如下所示:
<?php
header('Content-Type: text/html; charset=UTF-8');
$original = 'Fön';
$totalbytes = strlen($original);
for($byteposition = 0; $byteposition < $totalbytes; $byteposition++)
{
$currentbyte = $original[$byteposition];
/*
Doesn't work since var_dump shows 3 bytes.
*/
var_dump($currentbyte);
/*
Fails too since "ord" only works on ASCII chars.
It returns "46 F6 6E"
*/
printf("%02X", ord($currentbyte));
echo('<br>');
}
exit();
?>
This proves my initial idea is not working:这证明我最初的想法行不通:
How can I get the single bytes from a multibyte PHP string variable in a binary-safe way?如何以二进制安全的方式从多字节 PHP 字符串变量中获取单个字节?
What I am looking for is a binary-safe way to convert UTF-8 string(s) into byte-array(s).我正在寻找的是一种将 UTF-8 字符串转换为字节数组的二进制安全方式。
you can get a bytearray by unpacking the utf8_encoded string $a:你可以通过解压 utf8_encoded 字符串 $a 得到一个字节数组:
$a = utf8_encode('Fön');
$b = unpack('C*', $a);
var_dump($b);
used format C* for "unsigned char"使用格式 C* 表示“无符号字符”
References参考
I actually wrote my own class for this problem.我实际上为这个问题编写了自己的类。
I was trying to make the javascript new TextEncoder("utf-8").encode(...)
in PHP.我试图在 PHP 中制作 javascript
new TextEncoder("utf-8").encode(...)
。
So this is what i came up with: It uses the PHP所以这就是我想出的:它使用 PHP
ord()
function for getting the bytes ord()
函数用于获取字节
and the chr()
function for building the utf8 message back以及用于构建 utf8 消息的
chr()
函数
class Uint8Array{
public $val = array();
public $length = 0;
function from($string, $mode = "utf8"){
if($mode == "utf8"){
$arr = [];
foreach (str_split($string) as $chr) {
$arr[] = ord($chr);
}
$this->val = $arr;
$this->length = count($arr);
return $arr;
}
elseif($mode == "hex"){
$arr = [];
for($i=0;$i<strlen($string);$i++){
if($i%2 == 0)
$arr[] = hexdec($string[$i].$string[$i+1]);
}
$this->val = $arr;
$this->length = count($arr);
return $arr;
}
}
function toString($enc = "utf8"){
if($enc == "utf8"){
$str = "";
foreach($this->val as $byte){
$str .= chr($byte);
}
return $str;
}
elseif($enc == "hex"){
$str = "";
foreach($this->val as $byte){
$str .= str_pad(dechex($byte),2,"0",STR_PAD_LEFT);
}
return $str;
}
}
}
use it like this:像这样使用它:
create instance:创建实例:
$handle = new Uint8Array;
input with ->from(string, encoding)
like this: 1)utf8 2)hex bytes(without spaces)输入
->from(string, encoding)
像这样:1)utf8 2)hex 字节(没有空格)
$handle->from("Fön","utf8");
//or with hex bytes
$handle->from("46c3b66e","hex");
output with ->toString(encoding)
hex/utf8:使用
->toString(encoding)
hex/utf8 输出:
$to_utf8 = $handle->toString("utf8");
//Fön
$to_hex = $handle->toString("hex");
//46c3b66e
the byte-array itself can be found at ->val
as you can see here:字节数组本身可以在
->val
找到,如您所见:
$bytearray = $handle->val;
//[70, 195, 182, 110]
$arrayleng = $handle->length;
//4
that is all, be free to use this!就是这样,可以随意使用它!
You can learn more about used functions here:您可以在此处了解有关使用函数的更多信息:
chr() ord() chr() ord()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.