简体   繁体   中英

PHP character encoding problems

I need help with a character encoding problem that I want to sort once and for all. Here is an example of some content which I pull from a XML feed, insert into my database and then pull out.

As you can not see, a lot of special html characters get corrupted/broken.

How can I once and for all stop this? How am I able to support all types of characters, etc.?

I've tried literally every piece of coding I can find, it sometimes corrects it for most but still others are corrupted.

To absolutely once and for all make sure you will never have problems with encoding again:

Use UTF-8 everywhere and on everything!

That is (if you use mysql and php):

  • Set all the tables in your database to collation "utf8_general_ci" for example.
  • Once you establish the database connection, run the following SQL query: "SET NAMES 'utf8'"
  • Always make sure the settings of your editor are set to UTF-8 encoding.
  • Have the following meta tag in the section of your HTML documents:

    <meta http-equiv="content-type" content="text/html; charset=utf-8">

And couple of bonus tips:

OR:

You can just use one simple server side configuration file that takes care of all encoding stuff. In this case you wont need header and/or meta tags at all or php.ini file modification. Just add your wanted character set encoding to .htaccess file and put it into your www root. If you want to fiddle with character set strings and use your php code for that - thats another story. Database collation must ofcourse be correct.

Footnote: UTF-8 is not the encoding solution its an a solution. It doesn't matter what character set/encoding one is using as long as the used environment has been taking to consideration.

我最喜欢的关于来自JoelOnSoftware编码的文章: 绝对最低限度每个软件开发人员绝对必须知道Unicode和字符集

连接到数据库之后,但在执行任何事务之前,请执行以下行,以确保所有数据库通信都是UTF-8:

mysql_query("SET character_set_results = 'utf8', character_set_client = 'utf8', character_set_connection = 'utf8', character_set_database = 'utf8', character_set_server = 'utf8'", $dbconn);

It seems that an UTF-8 encoded text is interpreted with ISO 8859-1.

If you're processing XML documents, you have to use the encoding given either in the charset parameter in HTTP header field Content-Type or in the encoding attribute in the XML declaration . If none of both is given, the XML specification declares UTF-8 or UTF-16 as the default character encoding and you have to use some detection .

It looks like the link you gave has data that is encoded in utf-8. (Follow that link, then change the encoding of your browser to utf-8).

I sounds like you are having problems with inserting and retrieving from your database. Make sure your database table has utf-8 set as the encoding.

First off, make sure your database's character encoding is set to support UTF-8. Secondly, PHP's ICONV is going to be your friend. Finally, ensure that your response headers are sending the proper character encoding (again, UTF-8).

Did you try utf8_encode() and utf8_decode() ?

Which one you use will depend entirely on how your data is encoded, which you don't specify, but they are quite useful for this kind of cases.

header('Content-type: text/html; charset=UTF-8') ;

/**
 * Encodes HTML safely for UTF-8. Use instead of htmlentities. 
 *
 * @param string $var 
 * @return string 
 */
function html_encode($var)
{
    return htmlentities($var, ENT_QUOTES, 'UTF-8');
}

Those two rescued me and I think it is now working . I'll come back if I continue to encounter problems. Should I store it in the DB, eg as "&" or as "&"?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM