简体   繁体   中英

How to match Javascript unicode string precessing with php?

I am trying to handle strings on both php and javascript and I want them to behave the same. I wrote a javascript version of php chr() fucntion to implement this. However I run into some uft-8 unicode issue. For example, I want to create a string with Chinese characters "a大小b" which I can do correctly in php but fail in javascipt using the codes below. I want to ask experts what is wrong with the implementation.

Output are:

  php str=a----
  php str=a�----
  php str=a��----
  php str=a大----
  php str=a大�----
  php str=a大��----
  php str=a大小----
  php str=a大小b----

  --------

  js str=a---
  js str=aå---
  js str=aå¤---
  js str=a大---
  js str=a大å---
  js str=a大å°---
  js str=a大å°---
  js str=a大å°b---

The codes I used are as the following:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<body>
<div class="container">

<?php 
    $string5 = "" ; 
    $str_a = chr(97) ; 
    $string5 .= $str_a ;   echo "php str=$string5----<br>" ; 


    $str_c1 = chr(229) ; 
    $string5 .= $str_c1 ;   echo "php str=$string5----<br>" ; 
    $str_c2 = chr(164) ; 
    $string5 .= $str_c2 ;   echo "php str=$string5----<br>" ; 
    $str_c3 = chr(167) ; 
    $string5 .= $str_c3 ;   echo "php str=$string5----<br>" ; 


    $str_cs1 = chr(229) ; 
    $string5 .= $str_cs1 ;   echo "php str=$string5----<br>" ; 
    $str_cs2 = chr(176) ; 
    $string5 .= $str_cs2 ;   echo "php str=$string5----<br>" ; 
    $str_cs3 = chr(143) ; 
    $string5 .= $str_cs3 ;   echo "php str=$string5----<br>" ; 


    $str_b= chr(98) ; 
    $string5 .= $str_b ;   echo "php str=$string5----<br>" ; 

    echo "<br><br>--------<br><br>" ; 
?>


<script language = "JavaScript">   

    function chr2(codePt) {
      if (codePt > 0xFFFF) { 
        codePt -= 0x10000;
        return String.fromCharCode(0xD800 + (codePt >> 10), 0xDC00 + (codePt & 0x3FF));
      }
      return String.fromCharCode(codePt);
    }

    var string5 = "" ; 
    var str_a = chr2(97) ; 
    string5 += str_a ;     document.write( "js str="+string5+"---<br>"  ); 

    var str_c1 = chr2(229) ; 
    string5 += str_c1 ;   document.write( "js str="+string5+"---<br>"  ); 
    var str_c2 = chr2(164) ; 
    string5 += str_c2 ;   document.write( "js str="+string5+"---<br>"  ); 
    var str_c3 = chr2(167) ; 
    string5 += str_c3 ;   document.write( "js str="+string5+"---<br>"  ); 


    var str_cs1 = chr2(229) ; 
    string5 += str_cs1 ;   document.write( "js str="+string5+"---<br>"  ); 
    var str_cs2 = chr2(176) ; 
    string5 += str_cs2 ;   document.write( "js str="+string5+"---<br>"  ); 
    var str_cs3 = chr2(143) ; 
    string5 += str_cs3 ;   document.write( "js str="+string5+"---<br>"  ); 

    var str_b = chr2(98) ; 
    string5 += str_b ;   document.write( "js str="+string5+"---<br>"  ); 

</script>


</div> 
</body>
</html

PHP and JavaScript strings are fundamentally different. A PHP string is a series of bytes. A JavaScript string is a series of characters. (Actually a series of UTF-16 code units, but that's irrelevant to this example.)

is character U+5927 (Han Ideograph Big). To generate it in JavaScript you would use String.fromCharCode(0x5927) (or chr2(0x5927) using the above helper function).

229, 164, 167 is the byte representation of using the UTF-8 encoding ( "\\xE5\\xA4\\xA7" ). Splitting the character in the middle of the byte sequence is invalid which is why you get the error in the output of PHP. You can't split the byte sequence in the middle in JavaScript as its string model is character-based, so the code will never work the same.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM