简体   繁体   中英

MimeMessageParser unable to fetch from address

We have been stuck with this issue for quite some time now.In our project we are trying to parse an email that is written on to a file and get the data into the pojo. It works for most cases but when the email id is too long the mail id goes to the next line due to which the from address is not fetched instead the name is fetched.We are using commons-email-1.4 .

The input file containing the emailmessage has

case1:

From: "def, abc [CCC-OT]" <abc.def@test.com> //here it fetches the mail id properly

In the case of longer mail id the file has

case2:

From: "defxacdhf, abc [CCC-OT]" 
<abc.defxacdhf@test.com>// here the mail id jumps to the next line so the from address fetched contains the name

Here is the sample code

ByteArrayInputStream byteArrayStream = new ByteArrayInputStream(FileUtils.getStreamAsByteArray(buffInStream,
                lengthOfFile));
        // MimeMessage message = new MimeMessage(mailSession, byteArrayStream);
        MimeMessageParser mimeParser = new MimeMessageParser(MimeMessageUtils.createMimeMessage(mailSession,
                byteArrayStream));
        MimeMessageParser parsedMessage = mimeParser.parse();

when we try to get the from address

emailData.setFromAddress(parsedMessage.getFrom());

In case1 it returns abc.def@test.com and case2 it returns "defxacdhf, abc [CCC-OT]" . Any help here is appreciated.

EDIT the script files reads and write like below.

while read line
        do
            echo "$line" >> /directory/$FILE_NAME
        done

As discussed:

This is not an error in any of the libraries used but rather an input not conforming to RFC.

Quoting from RFC-822 :

3.1.1. LONG HEADER FIELDS

  Each header field can be viewed as a single, logical line of ASCII characters, comprising a field-name and a field-body. For convenience, the field-body portion of this conceptual entity can be split into a multiple-line representation; this is called "folding". The general rule is that wherever there may be linear-white-space (NOT simply LWSP-chars), a CRLF immediately followed by AT LEAST one LWSP-char may instead be inserted. 

I don't understand why you're using a shell while loop to read the data instead of just using cat or something like that, but the problem is in your use of "read". By default, read splits the input line into fields, separated by the field separators specified by the shell IFS environment variable. Leading field separators are ignored, so when you read a line that starts with white space, the white space is ignored.

Change your loop to:

    while IFS= read -r line
    do
        echo "$line" >> /directory/$FILE_NAME
    done

That sets IFS to the empty string before each read, and specifies a "raw" read so that backslash characters aren't special.

But unless you're doing something else in that read loop, it would be much simpler to do just

    cat > /directory/$FILE_NAME

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM