简体   繁体   中英

UTF-8 Problem, no Idea

I have a strange problem with some documents on my webpage.

My data is stored in a MYSQL Database, UTF8 encoded. If read the values my webbpage displays

Rezept : Gem se mal anders (Gem selaibchen)

I need ü / ü!

Content in the database is "Gemüse ... " ..

The raw data in my error_log looks like this

[title] => Rezept : Gemüse mal anders (Gemüselaibchen)

The webpage header is:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
            "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

<head>
<!--[if IE]>
  <link rel="stylesheet" href="http://www.dev-twitter-gewitter.com/css//blueprint/ie.css" 
        type="text/css" media="screen, projection">
<![endif]-->

<meta name="text/html; charset=UTF-8" content="Content-Type" />

You have to set the encoding of your web page.

There are three ways to set the encoding:

  1. HTML/XHTML : Use a HTTP header:

     Content-Type: text/html; charset=UTF-8 
  2. HTML : Use a meta element: (Also possible for XHTML, but somewhat unusually)

     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> 
  3. XHTML only : Set the encoding in the preamble: ( Preferred for XHTML )

     <?xml version="1.0" encoding="UTF-8"?> 

If you want to verify the problem first:

First change the encoding manually using your browser. If that works you can set it in your HTML file. Make sure you reset the manual encoding to automatic detection, otherwise it'll work on your workstation, but not on your users' workstations!

A PHP speciality : Make sure your internal encoding is set to UTF-8, too! All outputs are converted to this encoding.

You can enforce the internal encoding using mb_internal_encoding at the top of every file.

After all : All this doesn't help if your code isn't actually UTF-8 encoded! If it is, check if there are any helper functions which might destroy the UTF-8 encoding.

MySQL needs to know you want the output as UTF-8 - it's likely configured to send as latin1, so your browser sees the invalid UTF-8 byte sequences and outputs the "not a character" glyph.

Send the query "SET NAMES utf8" immediately after opening the MySQL connection, or change the configuration (if possible).

That Unicode replacement character only appears when the encoding is incorrect. So in your case you declared your data as UTF-8 encoded but it wasn't (at least the part you quoted). The ü encoded in ISO 8859-1 is 0xFC, but that's an invalid octet in UTF-8.

So you need to make sure that your data is actually encoded with UTF-8. There are functions that can check if a given string is UTF-8, eg mb_detect_encoding or this is_utf8 function .

Do this:

header('Content-Type: text/html; charset=utf-8');

before outputting any content.

The problem is likely that the connection to the database uses latin1. This is from what I know the default in many MySQL setups.

That means, even if you store the data as utf-8 in the database you will get it as latin1 when you fetch it, as the charset is converted on the fly to match the connection.

You have two options:

1. Change the default connection character set to be utf-8

This could mean trouble if you have other applications hosted on the same database server that expect iso-8859-1 from the database as when you change the config you will change the behaviour for all users of the MySQL server.

2. Change the connection charset after each connect to the database

If you use PHP5 you can use the built in command:

mysql_set_charset('utf8');

See http://php.net/manual/en/function.mysql-set-charset.php for more details.

If you are on PHP 4 you can do this by a simple SQL query like so:

mysql_query("SET NAMES 'UTF8'");

See http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html for more details.

utf8_encode fixed my problem. Iam not sure why (; the data in the database is utf8 , the website is utf8 ..

You should check the HTML headers too, especially (if wrong) how your webserver is configured. I had a similar issue in the past which was caused by the configuration of apache -- it was configured to always send the encoding in the content-type, and that overwrote the encoding passed via the <meta> tag as HTML page and webserver differed in that value.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM