简体   繁体   English

Perl使用MIME :: Parser解析电子邮件正文,不包含任何部分

[英]Perl parsing email body without parts using MIME::Parser

I have a perl script that uses MIME::Email to parse emails received from stdin, but it doesn't work on emails without parts. 我有一个使用MIME :: Email解析来自stdin的电子邮件的perl脚本,但不适用于没有任何部分的电子邮件。 I have no ability to modify the emails before they are sent. 我无法在发送之前修改电子邮件。

I'd like to be able to identify the significant part of the email, regardless of whether it's HTML or text, and store it in a buffer for processing later. 我希望能够识别电子邮件的重要部分,而不管它是HTML还是文本,并将其存储在缓冲区中以便以后处理。 Many of these emails are from a mailing list that are somehow generated automatically. 这些电子邮件中有许多是通过邮件列表自动生成的。

Sometimes they seem to just have one "Content-Type:" header with no boundaries. 有时它们似乎只有一个“ Content-Type:”标头,没有边界。

MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

Other times they have multiple text/plain parts, where one is the body of the email and another is a signature. 其他时候,它们具有多个文本/纯文本部分,其中一个是电子邮件的正文,另一个是签名。

There are a few other header lines after this, but then the body is just displayed without any boundary markers. 此后还有其他几行标题行,但随后仅显示了正文而没有任何边界标记。

This is my post from two years ago showing how I was able to eventually figure out how to parse most emails with parts Parsing email with Email::MIME and multipart/mixed with subparts 这是我两年前的文章,展示了我最终如何弄清如何将大部分电子邮件与零件一起解析使用Email :: MIME和多部分/与子部分混合来解析电子邮件

use strict;
use MIME::Parser;
use MIME::Entity;
use Email::MIME;
use Email::Simple;
my $parser = MIME::Parser->new;
$parser->extract_uuencode(1);
$parser->extract_nested_messages(1);
$parser->output_to_core(1);
my $buf;
while(<STDIN> ){
        $buf .= $_; 
}

my $entity = $parser->parse_data($buf);

$entity->dump_skeleton;
my $num_parts = $entity->parts;
for (my $i=0; $i < $num_parts; $i++) {
    my $part = $entity->parts($i);
    my $content_type = $part->mime_type;
    my $body = $part->as_string;

    print "body: $body\n";
}

The body text is never printed. 正文永远不会打印。 Only the following from dump_skeleton: dump_skeleton中只有以下内容:

Content-type: text/plain
Effective-type: text/plain
Body-file: NONE
Subject: Security update 

I'd really like the ability to modify my existing script (shown in the previous stackexchange post) to be able to print emails like this without any boundaries as well. 我真的很希望能够修改现有脚本(如上一则stackexchange文章中所示),以便能够无限制地打印此类电子邮件。

Is this poor formatting? 格式不正确吗? I've been unable to locate any examples of a library that can be used to just print the body, subject, and other basic headers of an email reliably without sophisticated steps to analyze the whole message by parts. 我一直无法找到可用于仅可靠地打印电子邮件的正文,主题和其他基本标题的库的任何示例,而没有复杂的步骤来按部分分析整个消息。

I know mimeexplode can do it, but I can't figure out how. 我知道mimeexplode可以做到,但是我不知道怎么做。 I need to store the mail body in a buffer to manipulate, so using a command-line program like mimeexplode would be a roundabout way of doing it anyway. 我需要将邮件正文存储在缓冲区中以进行操作,因此无论如何使用像mimeexplode这样的命令行程序都是一种回旋方式。

It is not fully clear for me what you are trying to achieve since you only post code but not the intention behind it in sufficient detail. 对于我来说,您要实现的目标尚不完全清楚,因为您仅发布代码,但没有足够详细的意图。 But you are using parts to inspect the message which is clearly documented to return the parts of a multipart/* or similar (ie message/rfc822 ) and does not handle single messages: 但是您正在使用parts来检查消息,该消息已明确记录为返回multipart/*或类似部件(即message/rfc822 )的部分,并且不处理单个消息:

... returns the array of all sub parts, returning the empty array if there are none (eg, if this is a single part message, or a degenerate multipart) . ...返回所有子部分的数组,如果不存在则返回空数组(例如,如果这是单个部分消息或简并的多部分) In a scalar context, this returns you the number of parts. 在标量环境中,这将返回零件数。

If you want to just get all parts including standalone "parts" (ie a single entity which is not part of anything) just use parts_DFS as in the following example, which prints the body for all entities which have a non-zero body: 如果只想获取包括独立“部件”在内的所有部件(即,不属于任何部件的单个实体),请使用parts_DFS如以下示例所示,该示例为具有非零主体的所有实体打印主体:

use MIME::Parser;
my $parser = MIME::Parser->new;
my $entity = $parser->parse(\*STDIN);
for my $part ($entity->parts_DFS) {
    defined(my $body = $part->bodyhandle) or next; # has no body, likely multipart or similar
    print "body: ".$body->as_string."\n";
}

EDIT: given you've updated question you are not looking for all parts but for the main text part. 编辑:给定您已更新的问题,您不是要查找所有部分,而是要查找正文部分。 It is not easy to determine what the actual main part is but you might try to use the first text/* part which is inline. 确定实际的主要部分并不容易,但是您可以尝试使用内联的第一个text/*部分。 This would probably look something like this: 这可能看起来像这样:

use MIME::Parser;
my $parser = MIME::Parser->new;
my $entity = $parser->parse(\*STDIN);
for my $part ($entity->parts_DFS) {
    defined(my $body = $part->bodyhandle) or next; # has no body, likely multipart or similar
    if (my $disp = $part->head->get('content-disposition')) {
        next if $disp !~ m{inline}i;
    }
    print "body: ".$body->as_string."\n";
    last;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM