简体   繁体   中英

How to really decode a 7Bit email message using PHP?

I have code that reads emails from a server and then parse them before inserting the data into a database.

I am using the IMAP extension in PHP to help me with this.

Here is what I am doing to read the data

//read new messages
private function _getNewMessages(){

    // Checks the inbox
    if ($messages = imap_search($this->conn,'ALL'))
    {
        // Sorts the messages newest first
        sort($messages);

        // Loops through the messages
        foreach ($messages as $id)
        {
            // Grabs the overview and body
            //$overview = imap_fetch_overview($this->conn, $id, 0);
            $struct = imap_fetchstructure($this->conn, $id, 0);
            $header = imap_headerinfo($this->conn, $id);
            $message = imap_fetchbody($this->conn, $id, 1);

            //decode the message
            if(isset($struct->encoding)){
                $message = $this->_decodeMessage($message, $struct->encoding);
            }


            $from = $header->from[0]->mailbox . '@' . $header->from[0]->host;
            $subject = $header->subject;    
            echo  $message;
        }
    }
    else
    {
        exit('No messages to process');
    }
}

the issue that I am seeing is that when the message come encoded with the value 0 "7 Bit" it is returned black. it seems that the decoding is failing to decode the message properly.

I am using this function to Decode the 7 Bit

    // function to decode 7BIT encoded message
    private function _decode7Bit($text) {
        // If there are no spaces on the first line, assume that the body is
        // actually base64-encoded, and decode it.
        $lines = explode("\r\n", $text);
        $first_line_words = explode(' ', $lines[0]);
        if ($first_line_words[0] == $lines[0]) {
            $text = base64_decode($text);
        }

        // Manually convert common encoded characters into their UTF-8 equivalents.
        $characters = array(
                     '=20' => ' ', // space.
                     '=E2=80=99' => "'", // single quote.
                     '=0A' => "\r\n", // line break.
                     '=A0' => ' ', // non-breaking space.
                     '=C2=A0' => ' ', // non-breaking space.
                     "=\r\n" => '', // joined line.
                     '=E2=80=A6' => '…', // ellipsis.
                     '=E2=80=A2' => '•', // bullet.
        );

        // Loop through the encoded characters and replace any that are found.
        foreach ($characters as $key => $value) {
            $text = str_replace($key, $value, $text);
        }

        return $text;
    }

I have also tried this method to decode the message

/**
 * decoding 7bit strings to ASCII
 * @param string $text
 * @return string
 */
function decode7bit($text){
        $ret = '';
        $data = str_split(pack('H*', $text));

        $mask = 0xFF;
        $shift = 0;
        $carry = 0;
        foreach ($data as $char) {
                if ($shift == 7) {
                        $ret .= chr($carry);
                        $carry = 0;
                        $shift = 0;
                }

                $a      =       ($mask >> ($shift+1)) & 0xFF;
                $b      =       $a ^ 0xFF;

                $digit = ($carry) | ((ord($char) & $a) << ($shift)) & 0xFF;
                $carry = (ord($char) & $b) >> (7-$shift);
                $ret .= chr($digit);

                $shift++;
        }
        if ($carry) $ret .= chr($carry);
        return $ret;
}

but the message is blank.

What am I doing wrong here? What I can do to make sure that the message is decoded properly?

Thank you

I figure out the issue. The way how I was checking if the message is base64 encoded or not was not good. so I change my function to this

private function _isEncodedBase64($date){

        if ( base64_encode(base64_decode($data)) === $data){
            return true;
        }

        return false;
    }

    // function to decode 7BIT encoded message
    private function _decode7Bit($text) {
        // If there are no spaces on the first line, assume that the body is
        // actually base64-encoded, and decode it.      
        if($this->_isEncodedBase64($text)){
            $text = base64_decode($text);
        } 


        // Manually convert common encoded characters into their UTF-8 equivalents.
        $characters = array(
                     '=20' => ' ', // space.
                     '=E2=80=99' => "'", // single quote.
                     '=0A' => "\r\n", // line break.
                     '=A0' => ' ', // non-breaking space.
                     '=C2=A0' => ' ', // non-breaking space.
                     "=\r\n" => '', // joined line.
                     '=E2=80=A6' => '…', // ellipsis.
                     '=E2=80=A2' => '•', // bullet.
        );

        // Loop through the encoded characters and replace any that are found.
        foreach ($characters as $key => $value) {
            $text = str_replace($key, $value, $text);
        }
        return $text;
    }

Based on the character encodings you're describing, it looks like your text is encoded using quoted printable.

Try decoding using:

echo quoted_printable_decode($text);

Unneccesary code for this. Just use quoted_printable_decode in php and you're done.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM