简体   繁体   中英

How can I detect encoding of a string which can be japanese, chinese or english and convert to utf8 for display?

On a php website, I get email from imap and save them in database.

On the other hand, I want to display some of them. That mailbox receive lot of english mails, but also japanese and chinese.

My problem with the following code is that I can't detect all charset. If I arrange the order of the array so chinese chars are ok, that became wrong for other charset.

<?php
$subject = "板イテ淌"; // can be japanese
$subject = "这间面积70平"; // can be chinese
$subject = "This string can have latin1 chars also";

function get_subject($subject);

$encs = array();
$enc[] = "Big5";
$enc[] = "big5";
$enc[] = "euc-kr";
$enc[] = "EU-CN";
$enc[] = "GB2312";
$enc[] = "ISO-8859-1";
$enc[] = "GBK";
$enc[] = "CP936";
$enc[] = "ASCII";
$enc[] = "JIS";
$enc[] = "UTF-8";
$enc[] = "EUC-JP";
$enc[] = "SJIS";
$enc[] = "latin1";
$encoding = mb_detect_encoding($this->object_message, $encs);
$subject = mb_convert_encoding($this->object_message, 'UTF-8', $encoding);
$subject = iconv ( 'utf-8', 'ISO-8859-2' , $subject );
return $subject;
?>

If you can't display them, you can't put them into the database correctly either.

You can't detect what encoding bytes are in just by looking at the bytes, except for UTF-8 because it has unique and restricted patterns. This is what detect_encoding does and is therefore useless for everything but detecting between very small amount of encodings with exclusive properties.

When you receive the email, you should read the encoding header and use that encoding to convert the data to UTF-8. Do not convert to ISO-8859-2 because it's a tiny charset and you will lose most characters.

You could use PHP email parser which returns the email contents in UTF-8.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM