简体   繁体   English

使用Java处理原始电子邮件数据

[英]Processing a Raw Email Data using Java

I'm having a DB which stores raw email contents. 我有一个存储原始电子邮件内容的数据库。 My requirement is to fetch individual mails from the DB and process that data to fetch the basic details of that particular email (such as FROM, TO, SUBJECT, etc..) and also to get all the attachments saved to the file system using Core Java. 我的要求是从数据库中获取单个邮件,并处理该数据以获取特定电子邮件的基本详细信息(例如FROM,TO,SUBJECT等。),并使用Core将所有附件保存到文件系统中Java。 Currently I'm able to fetch the raw email data from DB as a String, but not able to process that data. 目前,我能够以字符串的形式从数据库中获取原始电子邮件数据,但无法处理该数据。

How to process this raw email data (String data type) using Java? 如何使用Java处理此原始电子邮件数据(字符串数据类型)?

Edit: In the DB level the data is stored as NCLOB. 编辑:在数据库级别中,数据存储为NCLOB。 After fetching the data from the DB, it is then stored as a Java String data type. 从数据库中获取数据后,然后将其存储为Java String数据类型。

A sample email data is: 电子邮件数据示例为:

Return-Path: <support.bpm@mydomain>
Delivered-To: faxhealthuat@mydomain.com
Received: from naplmailer2.com (unknown [172.25.3.5])
    by mail3.mydomain.com (Postfix) with ESMTP id 46E6572049B
    for <faxhealthuat@mydomain.com>; Tue, 23 Feb 2016 15:16:43 +0530 (IST)
DKIM-Signature: v=1; a=rsa-sha256; d=mydomain; s=sms2; c=relaxed/simple;
    q=dns/txt; i=@mydomain; t=1456220806; x=1458812806;
    h=From:Sender:Reply-To:Subject:Date:Message-ID:To:Cc:MIME-Version:Content-Type:
    Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date:Resent-From:
    Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References:List-Id:
    List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive;
    bh=K7Tc1XHEFN5ey8WU6/HXHF9XYDMLCiIsVdU7DloptqI=;
    b=CEnhtyGSQi+08wghYzKjW61JpO/IqOCgjopdCaesEfRgdeu86BWTQ9ZV0G7mCkDz
    XChXBhzNsj+uST6yiu7ivYsCBqKvBAnyaoUvLSUw5rWAuCNlg1gdP1ilEzFnZZBB
    6U25CK64N81I5cKCdltgmUe5B97XueIV8M8LjhyemxM=;
X-AuditID: 7370fb5c-f79a16d000001484-b0-56cc2a86383c
Received: from CHNMURROOTCAS2.murugappa.com ( [172.25.1.14])
    by naplmailer2.com (Symantec Messaging Gateway) with SMTP id 8B.42.05252.68A2CC65; Tue, 23 Feb 2016 15:16:46 +0530 (IST)
Received: from CHNMURROOTMBX2.murugappa.com ([fe80::a141:6b81:60c9:125c]) by
 CHNMURROOTCAS2.murugappa.com ([fe80::fc6b:b33c:6d4f:fadd%12]) with mapi id
 14.03.0210.002; Tue, 23 Feb 2016 15:16:40 +0530
From: Support-BPM-CholaMS <support.bpm@mydomain>
To: "faxhealthuat@mydomain.com" <faxhealthuat@mydomain.com>
Subject: Test From Mail
Thread-Topic: Test From Mail
Thread-Index: AdFuHx8uv6VR8hDtQvKILSCahVrrMg==
Date: Tue, 23 Feb 2016 09:46:39 +0000
Message-ID: <B8C5C607CDD50E4D84DACA129D4CFD64C7299C49@CHNMURROOTMBX2.murugappa.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.111.10.60]
Content-Type: multipart/alternative;
    boundary="_000_B8C5C607CDD50E4D84DACA129D4CFD64C7299C49CHNMURROOTMBX2m_"
MIME-Version: 1.0
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprMKsWRmVeSWpSXmKPExsWyRpKRT7dN60yYwe2HihYvDps7MHqs73jD
    GsAY1cBok5iXl1+SWJKqkJJanGyr5JJZnJyTmJmbWqSQll+k4JyRn5Oo4BuspJCZYqtkqqRQ
    kJOYnJqbmldiq5RYUJCal6Jkx6WAAWyAyjLzFFLzkvNTMvPSbZU8g/11LSxMLXUNlexcPIOd
    fRw9fV2DFPz8E7ayZjx+spe54LdqxeLPS9kbGBcodzFyckgImEicOvSNFcIWk7hwbz1bFyMX
    h5DAdkaJdcd3QjmnGSU+z17PCFLFJmArseJgM5gtIuAocezPNxYQW1hAXGLdxFesEHEZieWH
    l0DZehLnzl5lA7FZBFQljhzoZQaxeQWCJW7seAZWwwi0+fupNUwgNjPQnFtP5jNBXCQgsWTP
    eWYIW1Ti5eN/UJcqSLR+PwUU5wCqz5fY8cEYYqSgxMmZT1gmMArNQjJpFkLVLCRVECU6Egt2
    f2KDsLUlli18zQxjnznwmAlZfAEj+ypG/rzEgpzcxMyc1CIjveT83E2MwJgvLvgds4Px00+n
    Q4wCHIxKPLzLG06HCbEmlhVX5h5ilOBgVhLhdeA7EybEm5JYWZValB9fVJqTWnyI0QcYIhOZ
    pUST84HpKK8k3tDI3MzQzMTY0NDc2BKHsJI4b6v84TAhgXRgaspOTS1ILYIZx8TBKdXAWDgr
    40nv+6kRyxcq/0qx//f+zokw3qrXR/M3XLflqeaaHnpi6YXDN39mzZhiMLv6DceSuWerT1xS
    SrXbcnaX/LOcj/pu9XFreqSf3lJ9lfYpY/3x2BW/+wofCb7749Fzfv3j/emHsy6/eO+X4LGs
    /4fGYpbrB0733TjNmyKzQWnjBP93PfbzFnEqsRRnJBpqMRcVJwIArc+Y8CYDAAA=

--_000_B8C5C607CDD50E4D84DACA129D4CFD64C7299C49CHNMURROOTMBX2m_
Content-Type: text/plain; charset="us-ascii"
content-transfer-encoding: quoted-printable

Testing for from mail fetch

--_000_B8C5C607CDD50E4D84DACA129D4CFD64C7299C49CHNMURROOTMBX2m_
Content-Type: text/html; charset="us-ascii"
content-transfer-encoding: quoted-printable
--_000_B8C5C607CDD50E4D84DACA129D4CFD64C7299C49CHNMURROOTMBX2m_--

Assuming the string you are fetching contains the new line delimiter 假设您要提取的字符串包含新行分隔符

String rawEmail = "YOUR EMAIL CONTENTS";
String [] lines =  rawEmail.split("\\r?\\n");
Map<String, String> attributes = new HashMap<>();
for(String line : lines)
{
    String [] tokens = line.split(":");
    if(!tokens[0].isEmpty()) 
    {
        attributes.put(tokens[0].trim(), tokens[1].isEmpty()? null : tokens[1].trim());
    }
}

Further processing for nested attributes would be done the same way 嵌套属性的进一步处理将以相同的方式进行

Well if you want to parse an email message, you just need to know the format of an email message. 好吧,如果您想解析电子邮件,则只需要知道电子邮件的格式即可。 This was once defined in RFC822, obsoleted by RFC2822, obsoleted by RFC5322. 它曾经在RFC822中定义,由RFC2822过时,由RFC5322过时。 You should read those documents first, and choose what part of them you want to be able to process. 您应该先阅读这些文档,然后选择要处理的文档的哪一部分。

At the highest level, a message in composed of lines. 在最高级别,由行组成的消息。 Those lines should be terminated with \\r\\n (CrLf), but you should not rely on that since you a getting your messages from a DB without knowing whether any pre-processing has occured. 这些行应以\\r\\n (CrLf)终止,但是您不应依赖于此,因为您是从数据库中获取消息而又不知道是否进行了任何预处理。 First comes a header (containing header lines) and optionaly a body separated from the header by an empty line. 首先是标题(包含标题行),然后是一个可选的主体,该主体与标题之间用空行分隔。

Header lines or of the form HEADER_NAME:HEADER_VALUE where the header name must not begin with a space. 标题行或格式为HEADER_NAME:HEADER_VALUE的标题行不能以空格开头。 In the header part, any line beginning with a space is a continuation line and must be concatenated to the value of previous line. 在标题部分中,任何以空格开头的行都是连续行,并且必须与前一行的值连接。

For more details, just refere to RFC 5322 . 有关更多详细信息,请参阅RFC 5322

Well, after doing some research based on your answers & comments, I got what I needed. 好吧,在根据您的回答和评论进行了一些研究之后,我得到了我所需要的。 Thank you all for your efforts. 谢谢大家的努力。

Just sharing the same here. 只是在这里分享。 The below Java method will fetch the email raw data from the database, find and save all the attachments contained in the email data to the file system, and finally returns either a success or a failure message. 下面的Java方法将从数据库中获取电子邮件原始数据,查找并保存电子邮件数据中包含的所有附件,并将其保存到文件系统中,最后返回成功或失败消息。

public static String saveAttachments(String EMAIL_ID)
{
    try
    {
        String saveDirectory = "C:\\Email\\Attachements\\";

        //Get email record from DB
        EMAIL newEMAILObj = EMAIL.getEmailDetailsForEmailId(EMAIL_ID);

        //Get email raw data into a String variable
        String emailRawData = newEMAILObj.getCONTENT();

        Session newSession = Session.getDefaultInstance(new Properties());
        InputStream inputStreamObj = new ByteArrayInputStream(emailRawData.getBytes());
        MimeMessage mimeMessageObj = new MimeMessage(newSession, inputStreamObj);
        String contentType = mimeMessageObj.getContentType();

        if (contentType.contains("multipart")) //Content may contain attachments
        {
            Multipart multiPart = (Multipart) mimeMessageObj.getContent();
            int numberOfParts = multiPart.getCount();
            for (int partCount = 0; partCount < numberOfParts; partCount++)
            {
                MimeBodyPart part = (MimeBodyPart) multiPart.getBodyPart(partCount);
                if (Part.ATTACHMENT.equalsIgnoreCase(part.getDisposition())) //This part is an attachment
                {
                    File file = new File(saveDirectory+part.getFileName());
                    part.saveFile(file);
                }
            }
        }
    }
    catch (MessagingException ex) 
    {
        return "FAILED: "+ex.getLocalizedMessage();
    }
    catch (IOException ex)
    {
        return "FAILED: "+ex.getLocalizedMessage();
    } 
    return "SUCCESS";
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM