简体   繁体   English

将字符串转换为utf-8

[英]converting string into utf-8

I've been reading up on a lot of answers on here, but whatever i try i cannot work out how to fix this. 我一直在这里阅读很多答案,但是无论我怎样尝试,我都无法解决该问题。

The Problem 问题

I have data which is being imported into a database. 我有正在导入数据库的数据。 This data has special chars like ' “ ” - é (but not limited to just those). 此数据具有特殊字符,例如“”-é(但不仅限于那些)。

They are displaying as black diamonds when the data is displayed. 显示数据时,它们显示为黑色菱形。

What I have tried 我尝试过的

I followed this: http://kunststube.net/frontback/ but when I do the import, it just breaks at the first ' and ignores the rest of the string (still inserts correctly). 我遵循了这个步骤: http : //kunststube.net/frontback/,但是当我执行导入时,它只是在第一个'处中断,并且忽略了其余的字符串(仍然正确插入)。

I've tried converting the string with utf8_encode() , I've tried htmlentities() and I've tried using mb_convert_encoding() all have varied results but don't actually resolve the problem fully, some remove some characters, some give lil squares on IE etc. 我试过用utf8_encode()转换字符串,试过htmlentities() ,试过使用mb_convert_encoding()都有不同的结果,但实际上并不能完全解决问题,有的删除了一些字符,有的给了lil IE等上的方块

What I think the problem is 我认为问题是

I think the problem is to do with not knowing the original encoding, so I run mb_detect_encoding() and it returns nothing - so what does that mean? 我认为问题是与不知道原始编码有关,所以我运行mb_detect_encoding()却什么也没返回-那是什么意思? I guess that it cannot detect the encoding. 我猜它无法检测到编码。

So what I'm struggling with is how to encode it to utf8 without breaking the string so I can store it properly. 所以我正在努力的是如何在不破坏字符串的情况下将其编码为utf8,以便我可以正确地存储它。

Observations 观察结果

If I set header('Content-Type: text/html; charset=utf-8'); 如果我设置header('Content-Type: text/html; charset=utf-8'); we get the black diamond, but if I set header('Content-Type: text/html; charset=ISO-8859-1'); 我们得到黑色菱形,但是如果我设置header('Content-Type: text/html; charset=ISO-8859-1'); it displays correctly. 它显示正确。

So knowing that - should I be displaying my whole website in ISO-8859-1 OR should I be converting that string to utf8.. is there a preference on how to do this? 所以知道-我应该显示我的整个网站在ISO-8859-1 或者我应该可以转换该字符串UTF8 ..有关于如何做到这一点的偏好?

When the DB was latin1 and i didn't include a charset in the PDO connection, the data was stored correctly in the database 当数据库为latin1并且我在PDO连接中未包含字符集时,数据已正确存储在数据库中

Other 其他

I am using PDO 我正在使用PDO

new PDO("mysql:host=" . $G['PDO_HOST'] . ";dbname=" . $G['PDO_DB'] . ";charset=utf-8", $G['PDO_USER'], $G['PDO_PASS'],array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES 'utf8'"));

a copy of a problem string: 问题字符串的副本:

Informed by his eclectic background and varied passions for décor, travel, entertaining and food, Nathan Turner’s American Style will appeal to readers looking to incorporate Turner’s stylish and relaxed aesthetic into their home and life.

Any input on this would be really appreciated - been struggling for a while on this 任何对此的投入将不胜感激-对此努力了一段时间

UPDATE 更新

Here is my table 这是我的桌子

CREATE TABLE IF NOT EXISTS `page` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `title` varchar(255) NOT NULL,
  `url` varchar(255) NOT NULL,
  `summary` text NOT NULL,
  `content` text NOT NULL,
  `search` text NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=937 ;

So the table is utf8 format. 因此该表是utf8格式。

I have changed the DB connection to: 我已将数据库连接更改为:

$dbc = new PDO("mysql:host=" . $G['PDO_HOST'] . ";dbname=" . $G['PDO_DB'], $G['PDO_USER'], $G['PDO_PASS']);
$dbc->query("SET NAMES utf8");

As "Your Common Sense" pointed out about the PHP version. 正如“您的常识”所指出的有关PHP版本。

But now I have this all set, it cuts of the insert at the first ' 但是现在我已经准备好了,它在第一个'

String: With a style that is accessible and chic, Turner's aesthetic is Nate meets Colin and the Magazine. 琴弦: With a style that is accessible and chic, Turner's aesthetic is Nate meets Colin and the Magazine.

Stored: With a style that is accessible and chic, Turner 储藏: With a style that is accessible and chic, Turner

UPDATE 2 更新2

I am using prepared statements.. so the content that is breaking is here: 我正在使用准备好的语句..因此,正在中断的内容在这里:

$stmt->bindParam(':content',$content, PDO::PARAM_STR);
charset=utf8
           ^ should be NO dash here

Also, if your PHP version below 5.3.6, it won't work anyway, SET NAMES utf8 query have to be run after connect. 另外,如果您的PHP版本低于5.3.6,则仍然无法正常工作,必须在连接后运行SET NAMES utf8查询。

As for the inserts, them doesn't cut anything. 至于刀片,它们什么也没剪。 It's your HTML fields. 这是您的HTML字段。

to output an HTML attribute, always use htmlspecialchars with ENT_QUOTES flag. 要输出HTML属性,请始终使用带有ENT_QUOTES标志的htmlspecialchars

在数据库中,在表中将排序规则类型的特定字段设置为“ utf8_general_ci”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM