简体   繁体   中英

Perl parsing email body without parts using MIME::Parser

I have a perl script that uses MIME::Email to parse emails received from stdin, but it doesn't work on emails without parts. I have no ability to modify the emails before they are sent.

I'd like to be able to identify the significant part of the email, regardless of whether it's HTML or text, and store it in a buffer for processing later. Many of these emails are from a mailing list that are somehow generated automatically.

Sometimes they seem to just have one "Content-Type:" header with no boundaries.

MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

Other times they have multiple text/plain parts, where one is the body of the email and another is a signature.

There are a few other header lines after this, but then the body is just displayed without any boundary markers.

This is my post from two years ago showing how I was able to eventually figure out how to parse most emails with parts Parsing email with Email::MIME and multipart/mixed with subparts

use strict;
use MIME::Parser;
use MIME::Entity;
use Email::MIME;
use Email::Simple;
my $parser = MIME::Parser->new;
$parser->extract_uuencode(1);
$parser->extract_nested_messages(1);
$parser->output_to_core(1);
my $buf;
while(<STDIN> ){
        $buf .= $_; 
}

my $entity = $parser->parse_data($buf);

$entity->dump_skeleton;
my $num_parts = $entity->parts;
for (my $i=0; $i < $num_parts; $i++) {
    my $part = $entity->parts($i);
    my $content_type = $part->mime_type;
    my $body = $part->as_string;

    print "body: $body\n";
}

The body text is never printed. Only the following from dump_skeleton:

Content-type: text/plain
Effective-type: text/plain
Body-file: NONE
Subject: Security update 

I'd really like the ability to modify my existing script (shown in the previous stackexchange post) to be able to print emails like this without any boundaries as well.

Is this poor formatting? I've been unable to locate any examples of a library that can be used to just print the body, subject, and other basic headers of an email reliably without sophisticated steps to analyze the whole message by parts.

I know mimeexplode can do it, but I can't figure out how. I need to store the mail body in a buffer to manipulate, so using a command-line program like mimeexplode would be a roundabout way of doing it anyway.

It is not fully clear for me what you are trying to achieve since you only post code but not the intention behind it in sufficient detail. But you are using parts to inspect the message which is clearly documented to return the parts of a multipart/* or similar (ie message/rfc822 ) and does not handle single messages:

... returns the array of all sub parts, returning the empty array if there are none (eg, if this is a single part message, or a degenerate multipart) . In a scalar context, this returns you the number of parts.

If you want to just get all parts including standalone "parts" (ie a single entity which is not part of anything) just use parts_DFS as in the following example, which prints the body for all entities which have a non-zero body:

use MIME::Parser;
my $parser = MIME::Parser->new;
my $entity = $parser->parse(\*STDIN);
for my $part ($entity->parts_DFS) {
    defined(my $body = $part->bodyhandle) or next; # has no body, likely multipart or similar
    print "body: ".$body->as_string."\n";
}

EDIT: given you've updated question you are not looking for all parts but for the main text part. It is not easy to determine what the actual main part is but you might try to use the first text/* part which is inline. This would probably look something like this:

use MIME::Parser;
my $parser = MIME::Parser->new;
my $entity = $parser->parse(\*STDIN);
for my $part ($entity->parts_DFS) {
    defined(my $body = $part->bodyhandle) or next; # has no body, likely multipart or similar
    if (my $disp = $part->head->get('content-disposition')) {
        next if $disp !~ m{inline}i;
    }
    print "body: ".$body->as_string."\n";
    last;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM