简体   繁体   English

UTF-8之后的奇数编码问题使“大多数”事情变得紧张

[英]Odd encoding issue after UTF-8 straightens “most” things out

Ok, So we have a script that takes emails sent to thunderbird, convertes part of the message to html and saves it to a MySQL. 好的,所以我们有一个脚本,可以将电子邮件发送到thunderbird,将部分消息转换为html并将其保存到MySQL。 Every file, every part written is set to UTF-8. 每个文件,每个写入的部分都设置为UTF-8。 Finally, on my end of the work, the CRM (written in PHP5.3 expected output Chrome and Firefox), I pull the message, along with other info and display something resembling GMail, but as a "task list" for our employees. 最后,在我的工作结束时,CRM(用PHP5.3编写的预期输出Chrome和Firefox),我拉出消息,连同其他信息并显示类似于GMail的内容,但作为我们员工的“任务列表”。

The problem I'm having, if you havn't guessed already, some customer emails are obviously using different encodings. 我遇到的问题,如果你还没有猜到,一些客户的电子邮件显然是使用不同的编码。 Thus, some (not all, and certainly not majority) of the e-mails don't show all characters correctly. 因此,一些(不是全部,当然不是大多数)电子邮件不能正确显示所有字符。

At first I made use of utf8_encode to get the email messages to look right, and this helps with most email messages coming from the database, however, a few slip by with bad characters. 起初我使用utf8_encode来使电子邮件看起来正确,这有助于大多数来自数据库的电子邮件消息,但是,有些电子邮件会出现错误的字符。

In the DB these "bad apostrophes" appear as ’ , but after utf8_encode they come through as ?? 在DB这些“坏撇号”出现’ ,但经过utf8_encode他们来通过的 ?? . I've tried various encoding things to guess and change as needed, however, this tends to hurt the vast majority of the other emails. 我已经尝试过各种编码方法来根据需要进行猜测和更改,但是,这往往会伤害绝大多数其他电子邮件。

Any suggestions, on one end of the pipe or the other, how I might get these few emails to match everything else, or how i might at least create a possible preg_replace filter at the end or something? 管道或另一端的任何建议,我如何获得这些电子邮件以匹配其他所有内容,或者我如何至少在最后创建一个可能的preg_replace过滤器或什么?

update 更新

it seems even the emails with bad characters are passed to end php as utf-8 according to mb_detect_encoding . 根据mb_detect_encoding即使是带有错误字符的电子邮件也会以utf-8形式传递给php。 This is before any extra encoding. 这是在任何额外编码之前。 iconv does detect the ones that ahve problems, but this really gives me no way to solve them and just puts a php error box up on the screen instead of a simple FALSE return that it says it's supposed to give, so this too seems to be no solution. iconv确实检测到那些问题,但这真的让我无法解决它们,只是在屏幕上放了一个php错误框而不是它说它应该给的简单的FALSE返回,所以这似乎也是没有解决方案。

The problem is that you don't know the encoding of the mail. 问题是您不知道邮件的编码。 utf8_encode encodes only from ISO-8859-1 to UTF-8. utf8_encode仅从ISO-8859-1编码为UTF-8。 So you could try to get the encoding with mb_detect_encoding and then convert to UTF-8 with iconv . 因此,您可以尝试使用mb_detect_encoding获取编码,然后使用iconv转换为UTF-8。

EDIT: You could also try to read the Content-Type 's charset of the mail. 编辑:您也可以尝试阅读Content-Type的邮件字符集。

Found My Answer! 找到我的答案!

Let me start by saying thanks Sebastián Grignoli for creating this VERY handy class ( raw ). 首先,我要感谢SebastiánGrignoli创建这个非常方便的课程原始 )。 I ended up working it into my final solution. 我最终将其用于最终解决方案。

Second, I added the class to Codeigniter . 其次,我将课程添加到Codeigniter For any of you using CI, this is an easy implementation. 对于任何使用CI的人来说,这是一个简单的实现。 Simply create a file in application/libraries named Encoding.php (yes with the capital e ). 只需在名为Encoding.php application/libraries创建一个文件(是的,使用大写e )。 Then copy in the code to that file, but comment out(or remove) namespace ForceUTF8 on line 40. 然后将代码复制到该文件,但在第40行注释掉(或删除) namespace ForceUTF8

My end result looks something like: 我的最终结果如下:

echo(Encoding::fixUTF8(utf8_decode($msgHTML)));

I'm still double checking, but thus far, I've yet to find one single error! 我还在仔细检查,但到目前为止,我还没有发现一个错误!

If I do find another encoding issue after this, I'll make sure to update. 如果我在此之后发现另一个编码问题,我会确保更新。

SO Question I found that helped.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM