简体   繁体   中英

Odd encoding issue after UTF-8 straightens “most” things out

Ok, So we have a script that takes emails sent to thunderbird, convertes part of the message to html and saves it to a MySQL. Every file, every part written is set to UTF-8. Finally, on my end of the work, the CRM (written in PHP5.3 expected output Chrome and Firefox), I pull the message, along with other info and display something resembling GMail, but as a "task list" for our employees.

The problem I'm having, if you havn't guessed already, some customer emails are obviously using different encodings. Thus, some (not all, and certainly not majority) of the e-mails don't show all characters correctly.

At first I made use of utf8_encode to get the email messages to look right, and this helps with most email messages coming from the database, however, a few slip by with bad characters.

In the DB these "bad apostrophes" appear as ’ , but after utf8_encode they come through as ?? . I've tried various encoding things to guess and change as needed, however, this tends to hurt the vast majority of the other emails.

Any suggestions, on one end of the pipe or the other, how I might get these few emails to match everything else, or how i might at least create a possible preg_replace filter at the end or something?

update

it seems even the emails with bad characters are passed to end php as utf-8 according to mb_detect_encoding . This is before any extra encoding. iconv does detect the ones that ahve problems, but this really gives me no way to solve them and just puts a php error box up on the screen instead of a simple FALSE return that it says it's supposed to give, so this too seems to be no solution.

The problem is that you don't know the encoding of the mail. utf8_encode encodes only from ISO-8859-1 to UTF-8. So you could try to get the encoding with mb_detect_encoding and then convert to UTF-8 with iconv .

EDIT: You could also try to read the Content-Type 's charset of the mail.

Found My Answer!

Let me start by saying thanks Sebastián Grignoli for creating this VERY handy class ( raw ). I ended up working it into my final solution.

Second, I added the class to Codeigniter . For any of you using CI, this is an easy implementation. Simply create a file in application/libraries named Encoding.php (yes with the capital e ). Then copy in the code to that file, but comment out(or remove) namespace ForceUTF8 on line 40.

My end result looks something like:

echo(Encoding::fixUTF8(utf8_decode($msgHTML)));

I'm still double checking, but thus far, I've yet to find one single error!

If I do find another encoding issue after this, I'll make sure to update.

SO Question I found that helped.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM