简体   繁体   English

Gmail工作时无法使用JavaMail读取Outlook邮件

[英]Can't read Outlook mail with JavaMail, while Gmail works

Basically, I wrote an application which reads emails from an inbox. 基本上,我写了一个从收件箱中读取电子邮件的应用程序。 I've always tested the application with e-mail sent from Gmail. 我一直在用Gmail发送的电子邮件测试应用程序。 But now when I am trying to read an e-mail which was sent from Outlook, I am not getting any content back. 但是现在当我试图阅读从Outlook发送的电子邮件时,我没有收到任何内容。

I logged the contenttypes from both the e-mails: Gmail returns: multipart/alternative; boundary=047d7b342bf2b6847f04d11df78a 我从两封电子邮件中记录了内容类型:Gmail返回: multipart/alternative; boundary=047d7b342bf2b6847f04d11df78a multipart/alternative; boundary=047d7b342bf2b6847f04d11df78a Outlook returns: text/html; charset=iso-8859-1 multipart/alternative; boundary=047d7b342bf2b6847f04d11df78a Outlook返回: text/html; charset=iso-8859-1 text/html; charset=iso-8859-1 Note: These are the same e-mails, just sent from different mail clients. text/html; charset=iso-8859-1 注意:这些是相同的电子邮件,只是从不同的邮件客户端发送的。

Mail from Gmail will be an instance of Multipart. 来自Gmail的邮件将是Multipart的一个实例。 While Outlook email will be an instance of String. 而Outlook电子邮件将是String的一个实例。

My code: 我的代码:

The method which checks if the message is an instanceof Multipart or String. 检查消息是否为Multipart或String的实例的方法。

public void getContent(Message msg) throws IOException, Exception {

    Object contt = msg.getContent();
    System.out.println("Contenttype: " + msg.getContentType());

    if (contt instanceof Multipart) {
        checkDisposition = true;
        handleMultipart((Multipart) contt);
    } else if (contt instanceof String) {   
       handlePart((Part) msg);
    }
    prepareEmail(mpMessage);
}

If the message is multipart this method will be called: 如果消息是multipart,则将调用此方法:

public void handleMultipart(Multipart multipart)
        throws MessagingException, IOException, Exception {
    mpMessage = getText(multipart.getBodyPart(0));

    for (int z = 1, n = multipart.getCount(); z < n; z++) {
        handlePart(multipart.getBodyPart(z));

    }
}

If the message isn't this will be called directly: 如果消息不是,则直接调用:

public void handlePart(Part part)
        throws MessagingException, IOException, Exception {



    Object con = messageCopy.getContent();

    String disposition = part.getDisposition();
    String contentType = part.getContentType();

    if (checkDisposition) {


        if (disposition == null) {

            System.out.println("Disposition is null");

        } else if (disposition.equalsIgnoreCase(Part.ATTACHMENT)) {
            System.out.println("Attachment: " + part.getFileName()
                    + " : " + contentType);
            input = part.getInputStream();
            bytes = IOUtils.toByteArray(input);
        } else if (disposition.equalsIgnoreCase(Part.INLINE)) {
            System.out.println("Inline: "
                    + part.getFileName()
                    + " : " + contentType);
        } else {
            System.out.println("Other: " + disposition);
        }
    }else{
        mpMessage = part.getContent().toString(); //returns nothing



        System.out.println("mpMessage handlePart "+mpMessage); //returns nothing
        System.out.println("mpMessage handlePart "+part.getLineCount()); //returns 0
        System.out.println("mpMessage handlePart "+part.getContentType()); //returns text/html chartset=iso-8859-1
        System.out.println("mpMessage handlePart "+part.getSize()); // returns 22334
        part.writeTo(System.out); //See below

    }

} }

The method which returns the text from the parts: 从部件返回文本的方法:

private String getText(Part p) throws
        MessagingException, IOException {

    System.out.println("getText contentType "+p.getContentType());

//This part gets called if trying to read an Outlook mail, its not clear for me how to  retrieve the text from the part. Since `p.getContent()` returns nothing
    if (p.isMimeType("text/*")) {
        String s = (String) p.getContent();
        System.out.println();
        return String.valueOf(s);
    }

    if (p.isMimeType("multipart/alternative")) {
        Multipart mp = (Multipart) p.getContent();
        String text = null;
        for (int i = 0; i < mp.getCount(); i++) {
            Part bp = mp.getBodyPart(i);
            if (bp.isMimeType("text/plain")) {
                String s = getText(bp);
                if (s != null) {
                    return s;
                }
            }
        }
        return text;
    }
    return null;
}

part.writeTo(System.out) returns: part.writeTo(System.out)返回:

Received: from AMSPRD0710HT005.eurprd07.prod.outlook.com Server (TLS) id 00000; 收到:来自AMSPRD0710HT005.eurprd07.prod.outlook.com Server(TLS)id 00000; Thu, 20 Dec 2012 09:28:23 +0000 Received: from AMSPRD0710MB354.eurprd07.prod.outlook.com ([00.000.0000]) by AMSPRD0710HT005.eurprd07.prod.outlook.com ([00.000.0000]) with mapi id 14.16.0245.002; 星期四,2012年12月20日09:28:23 +0000收到:来自AMSPRD0710MB354.eurprd07.prod.outlook.com([00.000.0000])AMSPRD0710HT005.eurprd07.prod.outlook.com([00.000.0000])with mapi id 14.16.0245.002; Thu, 20 Dec 2012 09:28:05 +0000 From: test To: support Subject: Verwerkingsverslag Kenmerk: 0824496 Thread-Topic: Verwerkingsverslag Kenmerk: 0824496 Thread-Index: Ac3elFC2qYsSo+SOT2ii4HnbCCqgVw== Date: Thu, 20 Dec 2012 10:28:05 +0100 Message-ID:... 星期四,2012年12月20日09:28:05 +0000来自:测试至:支持主题:Verwerkingsverslag Kenmerk:0824496主题 - 主题:Verwerkingsverslag Kenmerk:0824496主题索引:Ac3elFC2qYsSo + SOT2ii4HnbCCqgVw ==日期:星期四,2012年12月20日10: 28:05 +0100消息ID:...

And so on. 等等。

The content of the message itself gets returned as HTML code, not just normal text. 消息本身的内容将作为HTML代码返回,而不仅仅是普通文本。

How do I retrieve the plain text from the Outlook email, instead of the HTML code? 如何从Outlook电子邮件中检索纯文本,而不是HTML代码? Or how do I retrieve the content of the part in handlePart? 或者如何在handlePart中检索部件的内容?

Any help is appreciated, 任何帮助表示赞赏,

Thanks! 谢谢!

You seem to be assuming that Outlook sent along the plain text with the HTML version, which does not appear to be the case. 您似乎假设Outlook使用HTML版本发送纯文本,但似乎并非如此。 The MIME-type for the email you logged from Outlook is text/html , which indicates that it is just an HTML-formatted document. 您从Outlook记录的电子邮件的MIME类型是text/html ,表示它只是一个HTML格式的文档。 The Gmail version, on the other hand, sent a document of multipart/alternative , which could indicate that there are multiple versions of the email in the same document (plain text and HTML -- I believe this is the default behaviour for Gmail). 另一方面,Gmail版本发送了一个multipart/alternative文档,这可能表明同一文档中有多个版本的电子邮件(纯文本和HTML - 我相信这是Gmail的默认行为)。 Thus, if you are getting the HTML-encoded version, you are getting the "text" of the email just as it was sent. 因此,如果您获得HTML编码版本,您将收到电子邮件的“文本”,就像它发送一样。

There is no requirement that emails be sent with a plain-text version or, indeed, with any other format. 不要求使用纯文本版本或实际上使用任何其他格式发送电子邮件。 It is up to you to ensure that the mail client is sending the email in a format that your consuming program can handle or to change the consuming program to handle the formats being sent. 您可以确保邮件客户端以您的消费程序可以处理的格式发送电子邮件,或者更改消费程序以处理正在发送的格式。

In addition to the above, you may want to reconsider this line: 除上述内容外,您可能还想重新考虑以下内容:

mpMessage = getText(multipart.getBodyPart(0));

Which appears to assume that the first part of the multipart message will be a plain text document and the text of the message. 这似乎假设多部分消息的第一部分将是纯文本文档和消息的文本。 This might be a bad assumption. 这可能是一个糟糕的假设。


So, assuming you have actually gotten the mail message with the HTML content, getContent() shouldn't be returning null or an empty string. 因此,假设您实际上已收到带有HTML内容的邮件消息,则getContent()不应返回null或空字符串。 It should be returning an InputStream as per the documentation on MimeBodyPart#getContent() . 它应该按照MimeBodyPart#getContent()文档返回一个InputStream Reading the InputStream should enable you to produce a string with the HTML tags. 读取InputStream应该可以生成带有HTML标记的字符串。

Since you don't seem to care about the HTML, but just the content, the process can be greatly simplified by just using a Java HTML parsing library such as Jsoup . 由于您似乎并不关心HTML,而只关心内容,因此只需使用Java HTML解析库(如Jsoup)即可大大简化该过程。 Basically, you can integrate this into your current code by changing getText() to something like this: 基本上,您可以通过将getText()更改为以下内容将其集成到当前代码中:

private String getText(Part p) throws MessagingException, IOException {
    System.out.println("getText contentType "+p.getContentType());
    if (p.isMimeType("text/plain")) {
        String s = (String) p.getContent();
        System.out.println(s);
        return s;
    } else if (p.isMimeType("text/html")) {
        // the last two parameters of this may need to be modified
        String s = Jsoup.parse(p.getInputStream(), null, null).text();
        System.out.println(s);
        return s;
    } else if (p.isMimeType("multipart/alternative")) {
        Multipart mp = (Multipart) p.getContent();
        String text = "";
        for (int i = 0; i < mp.getCount(); i++) {
            Part bp = mp.getBodyPart(i);
            if (bp.isMimeType("text/*")) {
                String s = getText(bp);
                if (s != null) {
                    text += s;
                }
            }
        }
        return text;
    }
    return null;
}

Note that this assumes that the email is small enough to be read and parsed entirely in memory. 请注意,这假定电子邮件足够小,可以在内存中完全读取和解析。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM