简体   繁体   中英

How to convert unkown character set into utf8?

I have the string Mühle saved as Mühle in a utf8 database. I want it to be in UTF8 so it will show properly on my webpage which also uses utf8.

I think the string was not converted into utf8 befor writing it into the database and now it won't show properly on my web page.

I tried selecting this string from my mysql database and converting it into utf8, but it didn't work. I also tried decoding it multiple times, but also that didn't work. See the code I used for that below:

$string = Mühle;
$string=utf8_encode($string);
echo $string;

and

$string = Mühle;
$string=utf8_decode($string);
$string=utf8_encode($string);
echo $string;

The output of the above code was the same as the input in both cases, not changing anything about the string.

What can I do to convert this string in such a way that I can update it in my mysql database and when selecting it the next time it will show properly as Mühle?

Your string is double-encoded UTF-8 - that is, UTF-8 that was interpreted as Latin-1 and then re-encoded to UTF-8.

This can happen when you mess up your character encodings - eg when you send UTF-8 data while your MySQL is expecting the connection to use Latin-1. To fix this, you will need to call mysqli_set_charset (or the equivalent function for your database API) as soon as you create the connection, or modify the MySQL configuration to use UTF-8 connections by default.

Furthermore, you will need to fix your data - this is done using utf8_decode an appropriate number of times. If "Mühle" is the exact bytes returned by your database with a UTF-8 connection , you need to read that string, send it through utf8_decode, and then update that row (still using a UTF-8 connection).

Please note that when you select a row in MySQL, it gets converted from the table character set to the connection character set before getting sent back to the client. So, if you're seeing "Mühle" on your screen, MySQL is using a UTF-8 connection, and you're displaying the string as UTF-8, you need to call utf8_decode twice in order to fix it, because this means the string is actually triple-encoded - twice in the database text, and once for the display. Double-check everything , preferably using a well-developed MySQL client like phpMyAdmin - until it is displayed properly there, your data is still encoded incorrectly .

If this is only an issue with a few rows, manual fixing is okay; if it's a general problem with your database, you might prefer to dump an SQL script, convert that file, and use it to replace your old data.

Try the following function. It'll convert the string back to UTF-8.

function convert_smart_quotes($string)
{
$string = htmlentities($string);
$string = mb_convert_encoding($string, 'HTML-ENTITIES', 'utf-8');
$string = htmlspecialchars_decode(utf8_decode(htmlentities($string, ENT_COMPAT, 'utf-8', false)));

$s = array(
    chr(145) => "'",
    chr(146) => "'",
    chr(147) => '"',
    chr(148) => '"',
    chr(151) => '-',
    's©' => '©',
    '®' => '®',
    '™' => '™', //™
    '“' => '"', // left side double smart quote
    'â€' => '"', // right side double smart quote
    '‘' => "'", // left side single smart quote
    '’' => "'", // right side single smart quote
    '…' => '...', // elipsis
    '—' => '-', // em dash
    '–' => '-', // en dash
);

return strtr($string, $s);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM