简体   繁体   English

字符编码问题UTF-8和ISO-8859-1

[英]Character Encoding Issue UTF-8 and ISO-8859-1

I have a web application that I'm having problems getting Japanese/Chinese characters to display properly. 我有一个Web应用程序,无法正确显示日文/中文字符。 The thing being that i can display these characters properly when I am hard coding them into an HTML document. 问题是,当我将它们硬编码为HTML文档时,我可以正确显示这些字符。

Characters such as: 字符,例如:

アイヌの工芸 : ペンシルバニア大学考古学人類学博物館ヒラーコレクション

But when I grab them out of this proprietary database it comes out as junk: 但是,当我从此专有数据库中获取它们时,它就会变成垃圾:

ã¢ã¤ãã®å·¥è¸ : ãã³ã·ã«ããã¢å¤§å­¦èå¤å­¦äººé¡å­¦åç©é¤¨ãã©ã¼ã³ã¬ã¯ã·ã§ã³

Now i have the html document encoded in utf-8 现在我已经用utf-8编码了html文档

<meta http-equiv="content-type" content="text/html; charset=utf-8"/>

The actual html file itself is saved as "Encoded in utf-8" and not ISO-8859-1 or Western Latin etc. 实际的html文件本身保存为“以utf-8编码”,而不是ISO-8859-1或Western Latin等。

So the weird thing is that when I use iconv to take the junk character string and convert it from utf-8 to ISO-8859-1 it displays correctly. 因此,很奇怪的是,当我使用iconv接收垃圾字符串并将其从utf-8转换为ISO-8859-1时,它可以正确显示。

iconv("UTF-8", "ISO-8859-1//TRANSLIT", $junk_string)

It seems like the junk string is UTF-8 and when I convert the string to ISO-8859-1 it then displays the characters correctly. 看来垃圾字符串是UTF-8,当我将字符串转换为ISO-8859-1时,它将正确显示字符。 This doesn't make sense to me at all. 这对我完全没有意义。

So I sort of have an answer to my problem but I do not know why it works. 所以我对我的问题有一个答案,但是我不知道为什么会起作用。 I thought that having encoding in UTF-8 was supposed to fix this kind of thing. 我认为使用UTF-8编码应该可以解决这种问题。 And I am using Verdana but have tried a couple of other fonts with no success. 我正在使用Verdana,但尝试了其他几种字体,但均未成功。 And the weird thing being that I can hard code the characters with no problem into the html page and they display fine. 奇怪的是,我可以毫无问题地将字符硬编码到html页面中,并且显示效果很好。 But when get the same data from the database it is displayed as junk without me changing the encoding to ISO-8859-1. 但是,当从数据库中获取相同的数据时,它显示为垃圾,而无需将编码更改为ISO-8859-1。

Anyone have any insight here? 有人在这里有见识吗? And instead of doing this to every piece of data gotten from the database is there a way I can change this on the individual page level? 而不是对从数据库中获得的每条数据执行此操作,有没有一种方法可以在单个页面级别上进行更改? I also tried to change the encoding to 我也尝试将编码更改为

<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"/>

And the characters from the database still do not display correctly. 并且数据库中的字符仍然无法正确显示。

Just a guess, but when a database is utf8 and the html document is utf8, the problem most likely is the database connection, at least in my experience with MySQL. 只是一个猜测,但是当数据库是utf8且html文档是utf8时,至少在我对MySQL的经验中,问题最有可能是数据库连接。

For example for MySQL (the old / regular version), you need to set the character set after opening a database: 例如对于MySQL(旧/常规版本),您需要在打开数据库后设置字符集:

mysql_set_charset('utf8');

For PDO / MySQL it would be something like: 对于PDO / MySQL,它类似于:

$db->exec('SET CHARACTER SET utf8');

The answer would be you have wrong data in the database. 答案将是您数据库中的数据错误。 What probably happened is that you did a conversion ISO-8859-1 -> UTF-8 on data that's already in UTF-8. 可能发生的情况是,您对UTF-8中已经存在的数据进行了ISO-8859-1-> UTF-8转换。 Therefore, doing a conversion UTF-8 -> ISO-8859-1 gives you the original UTF-8 data back. 因此,执行转换UTF-8-> ISO-8859-1会返回原始的UTF-8数据。

Make sure you're not calling utf8_encode (which does an ISO-8859-1 -> UTF-8 conversion) on UTF-8 data! 确保您没有对UTF-8数据调用utf8_encode (执行ISO-8859-1-> UTF-8转换)!

Since every UTF-8 string is also a valid ISO-8859-1 string (well, not quite, but it's commonly extended so that that's the case), you have no errors on the ISO-8859-1 -> UTF-8 conversion over UTF-8 data. 由于每个UTF-8字符串也是一个有效的ISO-8859-1字符串(嗯,虽然不尽然,但是通常会进行扩展,这样的话),所以ISO-8859-1-> UTF-8转换不会出错UTF-8数据。

This might be because PHP does not deal with UTF-8 natively: 这可能是因为PHP本身不处理UTF-8:

A string is series of characters, where a character is the same as a byte. 字符串是一系列字符,其中一个字符与一个字节相同。 This means that PHP only supports a 256-character set, and hence does not offer native Unicode support . 这意味着PHP仅支持256个字符的集合,因此不提供本机Unicode支持

Source: http://php.net/manual/en/language.types.string.php 来源: http//php.net/manual/en/language.types.string.php

So when receiving the UTF-8 encoded data from your database, you either want to: 因此,从数据库接收UTF-8编码的数据时,您要么想要:

  • Transcode your data to single byte encoded string for native suport (with utf8_decode or iconv ), BUT you may loose characters (in your case a lot...) 将数据转码为单字节编码的字符串以进行本机支持(使用utf8_decodeiconv ),但是您可能会丢失字符(在很多情况下...)

  • Or manipulate your data with the bunch of functions offered by PHP to deal with Multibyte string 或使用PHP提供的一堆函数处理数据以处理多字节字符串

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM