简体   繁体   中英

PHP Unicode to UTF-8 code

I'm trying to get the UTF-8 bytes (in decimal) of a unicode string. For instance:

function unicode_to_utf8_bytes($string) {

}

$text = 'Hello 😀';
$result = unicode_to_utf8_bytes($text);

var_dump($result);

array(10) {
  [0]=>
  int(72)
  [1]=>
  int(101)
  [2]=>
  int(108)
  [3]=>
  int(108)
  [4]=>
  int(111)
  [5]=>
  int(32)
  [6]=>
  int(240)
  [7]=>
  int(159)
  [8]=>
  int(152)
  [9]=>
  int(128)
}

An example of the result can be seen here:

http://apps.timwhitlock.info/unicode/inspect?s=Hello+%F0%9F%98%80

I feel I'm close, this is what I managed to get:

function utf8_char_code_at($str, $index) {

    $char = mb_substr($str, $index, 1, 'UTF-8');

    if (mb_check_encoding($char, 'UTF-8')) {
        $ret = mb_convert_encoding($char, 'UTF-32BE', 'UTF-8');
        return hexdec(bin2hex($ret));
    }
    else
        return null;

}

function unicode_to_utf8_bytes($str) { 

    $result = array();

    for ($i=0; $i<mb_strlen($str, '8bit'); $i++)
        $result[] = utf8_char_code_at($str, $i);

    return $result;

}

$string = 'Hello 😀';

var_dump(unicode_to_utf8_bytes($string));

array(10) {
  [0]=>
  int(72)
  [1]=>
  int(101)
  [2]=>
  int(108)
  [3]=>
  int(108)
  [4]=>
  int(111)
  [5]=>
  int(32)
  [6]=>
  int(128512)
  [7]=>
  int(0)
  [8]=>
  int(0)
  [9]=>
  int(0)
}

Any help will be much appreciated!

This gets the results you were looking for:

$string = 'Hello 😀';
var_export(ascii_to_dec($string));

function ascii_to_dec($str)
{
  for ($i = 0, $j = strlen($str); $i < $j; $i++) {
    $dec_array[] = ord($str{$i});
  }
  return $dec_array;
}

Results:

array (
  0 => 72,
  1 => 101,
  2 => 108,
  3 => 108,
  4 => 111,
  5 => 32,
  6 => 240,
  7 => 159,
  8 => 152,
  9 => 128,
)

Source

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM