简体   繁体   中英

How to print Hexadecimal UTF-8 characters in PHP

How to print UFT-8 Characters from their Hexadecimal UTF-8 values? I read this post, but it did not solve my problem...

I work with many strings that are sanskrit words stored in a database. I have their HTML values, 16 bit binary code points, hex codes, and decimal codes, but I want to be able to work with their Hexadecimal UTF-8 values and output their symbolic form.

For example, here is a word आम that has a Binary UTF-8 value of 111000001010010010111000111000001010010010101110 . I want to see/store/print its Hexadecimal UTF-8 value and print its symbolic form.

For example, here's a snippet of my code:

$BinaryUTF8 = "111000001010010010000110111000001010010010101110";

$Temporary = dechex(bindec($BinaryUTF8));

$HexadecimalUTF8 = NULL;

for($i = 0; $i < strlen($Temporary); $i+=2)
{
    $HexadecimalUTF8 .= "\x".$Temporary[$i].$Temporary[$i+1];
}

$Test = "\xe0\xa4\x86\xe0\xa4\xae";

echo "\$Test = ".$Test;

echo "<br>";

echo "\$HexadecimalUTF8 = ".$HexadecimalUTF8;

The output is:

$Test = आम
$HexadecimalUTF8 = \xe0\xa4\x86\xe0\xa4\xae

$Test output the desired characters.

Why does $HexadecimalUTF8 not output the desired characters?

Your binary is wrong (I have fixed it below)

You are making a string containing the text "\\xe0" instead of the character which represents that, The hex is just a number really.

This seems to work now

<?php
$BinaryUTF8 = "111000001010010010000110111000001010010010101110";

$Temporary = dechex(bindec($BinaryUTF8));

$HexadecimalUTF8 = NULL;

for($i = 0; $i < strlen($Temporary); $i+=2)
{
    $HexadecimalUTF8 .= '\x' . $Temporary[$i].$Temporary[$i+1];
}

$Test = "\xe0\xa4\x86\xe0\xa4\xae";

echo "\$Test = ".$Test;

echo "<br>";
echo "\$HexadecimalUTF8 = " . makeCharFromHex($HexadecimalUTF8);

function makeCharFromHex($hex) {
    return preg_replace_callback(
        '#(\\\x[0-9A-F]{2})#i',
        function ($matches) {

            return chr(hexdec($matches[1]));
        },
        $hex
    );
}

This question reminds me how poor PHP is for multi byte support

To print UTF-8 characters from their decimal value you can use this function

<?php

function chr_utf8($n,$f='C*'){
return $n<(1<<7)?chr($n):($n<1<<11?pack($f,192|$n>>6,1<<7|191&$n):
($n<(1<<16)?pack($f,224|$n>>12,1<<7|63&$n>>6,1<<7|63&$n):
($n<(1<<20|1<<16)?pack($f,240|$n>>18,1<<7|63&$n>>12,1<<7|63&$n>>6,1<<7|63&$n):'')));
}

echo chr_utf8(9405).chr_utf8(9402).chr_utf8(9409).chr_utf8(hexdec('24C1')).chr_utf8(9412);

// Output ⒽⒺⓁⓁⓄ

// Note : Use hexdec to print UTF-8 encoded characters from hexadecimal number.

For your snippet you can try this… and check it in https://eval.in/748161

<?php

// function chr_utf8 shown above is required…

$BinaryUTF8 = "111000001010010010000110111000001010010010101110";

if (preg_match_all('#(0[01]{7})|(?:110([01]{5})10([01]{6}))|(?:1110([01]{4})10([01]{6})10([01]{6}))|(?:11110([01]{3})10([01]{6}),10([01]{6})10([01]{6}))#',$BinaryUTF8,$a,PREG_SET_ORDER))
$result=implode('',array_map(function($n){return chr_utf8(bindec(implode('',array_slice($n,1))));},$a));

echo $result;

// Output आम

// Note : If you work with "binary" the length of input must be multiple of 8.
// You can't remove leading zeros because this regex will not detect the character…

One other nice inline solution is the following… ( php v5.6+ required ) Check it in https://eval.in/748162

<?php

$BinaryUTF8 = "111000001010010010000110111000001010010010101110";
echo pack('C*',...array_map('bindec',str_split($BinaryUTF8,8)));

// Output आम

// Note : Length or $BinaryUTF8 of input must be multiple of 8.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM