简体   繁体   中英

Can't read Outlook mail with JavaMail, while Gmail works

Basically, I wrote an application which reads emails from an inbox. I've always tested the application with e-mail sent from Gmail. But now when I am trying to read an e-mail which was sent from Outlook, I am not getting any content back.

I logged the contenttypes from both the e-mails: Gmail returns: multipart/alternative; boundary=047d7b342bf2b6847f04d11df78a multipart/alternative; boundary=047d7b342bf2b6847f04d11df78a Outlook returns: text/html; charset=iso-8859-1 text/html; charset=iso-8859-1 Note: These are the same e-mails, just sent from different mail clients.

Mail from Gmail will be an instance of Multipart. While Outlook email will be an instance of String.

My code:

The method which checks if the message is an instanceof Multipart or String.

public void getContent(Message msg) throws IOException, Exception {

    Object contt = msg.getContent();
    System.out.println("Contenttype: " + msg.getContentType());

    if (contt instanceof Multipart) {
        checkDisposition = true;
        handleMultipart((Multipart) contt);
    } else if (contt instanceof String) {   
       handlePart((Part) msg);
    }
    prepareEmail(mpMessage);
}

If the message is multipart this method will be called:

public void handleMultipart(Multipart multipart)
        throws MessagingException, IOException, Exception {
    mpMessage = getText(multipart.getBodyPart(0));

    for (int z = 1, n = multipart.getCount(); z < n; z++) {
        handlePart(multipart.getBodyPart(z));

    }
}

If the message isn't this will be called directly:

public void handlePart(Part part)
        throws MessagingException, IOException, Exception {



    Object con = messageCopy.getContent();

    String disposition = part.getDisposition();
    String contentType = part.getContentType();

    if (checkDisposition) {


        if (disposition == null) {

            System.out.println("Disposition is null");

        } else if (disposition.equalsIgnoreCase(Part.ATTACHMENT)) {
            System.out.println("Attachment: " + part.getFileName()
                    + " : " + contentType);
            input = part.getInputStream();
            bytes = IOUtils.toByteArray(input);
        } else if (disposition.equalsIgnoreCase(Part.INLINE)) {
            System.out.println("Inline: "
                    + part.getFileName()
                    + " : " + contentType);
        } else {
            System.out.println("Other: " + disposition);
        }
    }else{
        mpMessage = part.getContent().toString(); //returns nothing



        System.out.println("mpMessage handlePart "+mpMessage); //returns nothing
        System.out.println("mpMessage handlePart "+part.getLineCount()); //returns 0
        System.out.println("mpMessage handlePart "+part.getContentType()); //returns text/html chartset=iso-8859-1
        System.out.println("mpMessage handlePart "+part.getSize()); // returns 22334
        part.writeTo(System.out); //See below

    }

}

The method which returns the text from the parts:

private String getText(Part p) throws
        MessagingException, IOException {

    System.out.println("getText contentType "+p.getContentType());

//This part gets called if trying to read an Outlook mail, its not clear for me how to  retrieve the text from the part. Since `p.getContent()` returns nothing
    if (p.isMimeType("text/*")) {
        String s = (String) p.getContent();
        System.out.println();
        return String.valueOf(s);
    }

    if (p.isMimeType("multipart/alternative")) {
        Multipart mp = (Multipart) p.getContent();
        String text = null;
        for (int i = 0; i < mp.getCount(); i++) {
            Part bp = mp.getBodyPart(i);
            if (bp.isMimeType("text/plain")) {
                String s = getText(bp);
                if (s != null) {
                    return s;
                }
            }
        }
        return text;
    }
    return null;
}

part.writeTo(System.out) returns:

Received: from AMSPRD0710HT005.eurprd07.prod.outlook.com Server (TLS) id 00000; Thu, 20 Dec 2012 09:28:23 +0000 Received: from AMSPRD0710MB354.eurprd07.prod.outlook.com ([00.000.0000]) by AMSPRD0710HT005.eurprd07.prod.outlook.com ([00.000.0000]) with mapi id 14.16.0245.002; Thu, 20 Dec 2012 09:28:05 +0000 From: test To: support Subject: Verwerkingsverslag Kenmerk: 0824496 Thread-Topic: Verwerkingsverslag Kenmerk: 0824496 Thread-Index: Ac3elFC2qYsSo+SOT2ii4HnbCCqgVw== Date: Thu, 20 Dec 2012 10:28:05 +0100 Message-ID:...

And so on.

The content of the message itself gets returned as HTML code, not just normal text.

How do I retrieve the plain text from the Outlook email, instead of the HTML code? Or how do I retrieve the content of the part in handlePart?

Any help is appreciated,

Thanks!

You seem to be assuming that Outlook sent along the plain text with the HTML version, which does not appear to be the case. The MIME-type for the email you logged from Outlook is text/html , which indicates that it is just an HTML-formatted document. The Gmail version, on the other hand, sent a document of multipart/alternative , which could indicate that there are multiple versions of the email in the same document (plain text and HTML -- I believe this is the default behaviour for Gmail). Thus, if you are getting the HTML-encoded version, you are getting the "text" of the email just as it was sent.

There is no requirement that emails be sent with a plain-text version or, indeed, with any other format. It is up to you to ensure that the mail client is sending the email in a format that your consuming program can handle or to change the consuming program to handle the formats being sent.

In addition to the above, you may want to reconsider this line:

mpMessage = getText(multipart.getBodyPart(0));

Which appears to assume that the first part of the multipart message will be a plain text document and the text of the message. This might be a bad assumption.


So, assuming you have actually gotten the mail message with the HTML content, getContent() shouldn't be returning null or an empty string. It should be returning an InputStream as per the documentation on MimeBodyPart#getContent() . Reading the InputStream should enable you to produce a string with the HTML tags.

Since you don't seem to care about the HTML, but just the content, the process can be greatly simplified by just using a Java HTML parsing library such as Jsoup . Basically, you can integrate this into your current code by changing getText() to something like this:

private String getText(Part p) throws MessagingException, IOException {
    System.out.println("getText contentType "+p.getContentType());
    if (p.isMimeType("text/plain")) {
        String s = (String) p.getContent();
        System.out.println(s);
        return s;
    } else if (p.isMimeType("text/html")) {
        // the last two parameters of this may need to be modified
        String s = Jsoup.parse(p.getInputStream(), null, null).text();
        System.out.println(s);
        return s;
    } else if (p.isMimeType("multipart/alternative")) {
        Multipart mp = (Multipart) p.getContent();
        String text = "";
        for (int i = 0; i < mp.getCount(); i++) {
            Part bp = mp.getBodyPart(i);
            if (bp.isMimeType("text/*")) {
                String s = getText(bp);
                if (s != null) {
                    text += s;
                }
            }
        }
        return text;
    }
    return null;
}

Note that this assumes that the email is small enough to be read and parsed entirely in memory.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM