简体   繁体   English

如何将未知的字符集转换为utf8?

[英]How to convert unkown character set into utf8?

I have the string Mühle saved as Mühle in a utf8 database. 我在utf8数据库中将字符串Mühle保存为Mülele。 I want it to be in UTF8 so it will show properly on my webpage which also uses utf8. 我希望它在UTF8中,因此它将在我的网页上正确显示,该网页也使用utf8。

I think the string was not converted into utf8 befor writing it into the database and now it won't show properly on my web page. 我认为字符串没有转换成utf8 befor将其写入数据库,现在它将无法在我的网页上正确显示。

I tried selecting this string from my mysql database and converting it into utf8, but it didn't work. 我尝试从我的mysql数据库中选择这个字符串并将其转换为utf8,但它没有用。 I also tried decoding it multiple times, but also that didn't work. 我也试过多次解码,但也没用。 See the code I used for that below: 请参阅我在下面使用的代码:

$string = Mühle;
$string=utf8_encode($string);
echo $string;

and

$string = Mühle;
$string=utf8_decode($string);
$string=utf8_encode($string);
echo $string;

The output of the above code was the same as the input in both cases, not changing anything about the string. 上述代码的输出与两种情况下的输入相同,不会改变有关字符串的任何内容。

What can I do to convert this string in such a way that I can update it in my mysql database and when selecting it the next time it will show properly as Mühle? 我可以做什么来转换这个字符串,以便我可以在我的mysql数据库中更新它,并在下次正确显示为Mühle时选择它?

Your string is double-encoded UTF-8 - that is, UTF-8 that was interpreted as Latin-1 and then re-encoded to UTF-8. 您的字符串是双重编码的 UTF-8 - 即UTF-8,它被解释为Latin-1,然后重新编码为UTF-8。

This can happen when you mess up your character encodings - eg when you send UTF-8 data while your MySQL is expecting the connection to use Latin-1. 当您搞乱角色编码时会发生这种情况 - 例如,当您希望连接使用Latin-1时发送UTF-8数据时。 To fix this, you will need to call mysqli_set_charset (or the equivalent function for your database API) as soon as you create the connection, or modify the MySQL configuration to use UTF-8 connections by default. 要解决此问题,您需要在创建连接后立即调用mysqli_set_charset (或数据库API的等效函数),或者默认情况下修改MySQL配置以使用UTF-8连接。

Furthermore, you will need to fix your data - this is done using utf8_decode an appropriate number of times. 此外,您需要修复数据 - 这是使用utf8_decode适当的次数完成的。 If "Mühle" is the exact bytes returned by your database with a UTF-8 connection , you need to read that string, send it through utf8_decode, and then update that row (still using a UTF-8 connection). 如果“Mühle”是数据库使用UTF-8连接返回的确切字节,则需要读取该字符串,通过utf8_decode发送,然后更新该行(仍使用UTF-8连接)。

Please note that when you select a row in MySQL, it gets converted from the table character set to the connection character set before getting sent back to the client. 请注意,当您在MySQL中选择一行时, 它会在发送回客户端之前从表字符集转换为连接字符集 So, if you're seeing "Mühle" on your screen, MySQL is using a UTF-8 connection, and you're displaying the string as UTF-8, you need to call utf8_decode twice in order to fix it, because this means the string is actually triple-encoded - twice in the database text, and once for the display. 所以,如果你在屏幕上看到“Mühle”,MySQL正在使用UTF-8连接, 并且你将字符串显示为UTF-8,你需要调用utf8_decode 两次才能修复它,因为这意味着该字符串实际上是三重编码的 - 在数据库文本中两次,一次用于显示。 Double-check everything , preferably using a well-developed MySQL client like phpMyAdmin - until it is displayed properly there, your data is still encoded incorrectly . 仔细检查所有内容 ,最好使用像phpMyAdmin这样开发良好的MySQL客户端 - 直到它在那里正确显示, 你的数据仍然编码不正确

If this is only an issue with a few rows, manual fixing is okay; 如果这只是几行的问题,手动修复是可以的; if it's a general problem with your database, you might prefer to dump an SQL script, convert that file, and use it to replace your old data. 如果这是数据库的一般问题,您可能更喜欢转储SQL脚本,转换文件,并使用它来替换旧数据。

Try the following function. 尝试以下功能。 It'll convert the string back to UTF-8. 它会将字符串转换回UTF-8。

function convert_smart_quotes($string)
{
$string = htmlentities($string);
$string = mb_convert_encoding($string, 'HTML-ENTITIES', 'utf-8');
$string = htmlspecialchars_decode(utf8_decode(htmlentities($string, ENT_COMPAT, 'utf-8', false)));

$s = array(
    chr(145) => "'",
    chr(146) => "'",
    chr(147) => '"',
    chr(148) => '"',
    chr(151) => '-',
    's©' => '©',
    '®' => '®',
    '™' => '™', //™
    '“' => '"', // left side double smart quote
    'â€' => '"', // right side double smart quote
    '‘' => "'", // left side single smart quote
    '’' => "'", // right side single smart quote
    '…' => '...', // elipsis
    '—' => '-', // em dash
    '–' => '-', // en dash
);

return strtr($string, $s);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM