简体   繁体   中英

Convert utf8/mixed to utf8 and strip non ascii chars

How to convert utf8 strings to iso 8859-1?

Why doesn't imap_mime_header_decode detect the utf8 coded string?

I need to remove all 4 byte unicode chars so the string fits in mysql utf8

Have tried this but it doesn't work

$text = mb_convert_encoding($text, 'UTF-8', 'UTF-8');

code

$input = '=?UTF-8?Q?=c3=b8en?=';
echo "$input\n";
$output = '';
foreach(imap_mime_header_decode($input) as $element){
    if($element->charset == 'utf-8'){
        echo "utf8 charset = $element->text\n";
        $output .= $element->text;
    }
    else{
        echo "default charset = $element->text\n";
        $output .= $element->text;
    }
}
// Here output should be iso 8859-1
echo "$output\n";
$string = preg_replace('/[^a-zæøåA-ZÆØÅ0-9 \-\.,:]/', '', $output);
// Back to utf8
$string = utf8_encode($string);
echo "$string\n";

output

=?UTF-8?Q?=c3=b8en?=
default charset = øen
øen
en

Use htmlentities() to convert the special characters to HTML entities. You can optionally specify an encoding of the source string, which is encouraged to specify. In your case, this would be 'UTF-8'. The HTML entities are safe to store in a database and are safe to output in their escaped form, although you may choose to use html_entity_decode to convert as many characters as possible back to an encoding of your choice.

I came up with this solution.. First it converts to utf-8 (including 4 byte unicode chars), then converts to iso 8859-1 and then stripping unwanted chars and then finally encoding to utf-8

:D

private function strip_non_ascii($string){
    $return = '';
    if(preg_match('/^=\?(iso-8859-1|utf-8)\?q\?/i', $string)){
        $return = str_replace('_',' ', mb_decode_mimeheader($string));
    }
    elseif(preg_match('/^(iso-8859-1\'\')(.*)$/i', $string, $matches)){
        $return = utf8_encode(rawurldecode($matches[2]));
    }
    else{
        $return = imap_utf8($string);
    }

    return utf8_encode(preg_replace('/[^a-zæøåA-ZÆØÅ0-9 \-\.,:]/', '', utf8_decode($return)));
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM