I'm just trying to understand character encoding a bit better, so I'm doing a few tests.
I have a PHP file that is saved as UTF-8 and looks like this:
<?php
declare(encoding='UTF-8');
header( 'Content-type: text/html; charset=utf-8' );
?><!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8" />
<title>Test</title>
</head>
<body>
<?php echo "\xBD"; # Does not work ?>
<?php echo htmlentities( "\xBD" ) ; # Works ?>
</body>
</html>
The page itself shows this:
The gist of the problem is that my web application has a bunch of character encoding problems, where people are copying and pasting from Outlook or Word and the characters get transformed into the diamond question marks (Do those have a real name?)
I'm trying to learn how to make sure all my input is transformed into UTF-8 when the page loads (Basically $_GET
, $_POST
, and $_REQUEST
), and all output is done using proper UTF-8 handling methods.
My question is: Why is my page showing the question mark for the first echo, and does anyone have any other information about making a UTF-8 safe web app in PHP?
0xBD is not valid UTF-8. If you want to encode "½" in UTF-8 then you need to use 0xC2 0xBD instead.
>>> print '\xc2\xbd'.decode('utf-8')
½
If you want to use text from another charset (Latin-1 in this case) then you need to transcode it to UTF-8 first using the various iconv or mb functions.
Also:
$ charinfo �
U+FFFD REPLACEMENT CHARACTER
\\xBD
无效,因为utf8你想要的是\\xC2\\xBD
,问号是什么应用程序替换无效的代码点,所以如果你在你的utf8文本中看到它不是utf8或已损坏。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.