简体   繁体   English

使用 MIME::Entity 将 HTML 电子邮件转为纯文本

[英]HTML email to plain text with MIME::Entity

I'm using a perl script to convert HTML mails to plain text.我正在使用 perl 脚本将 HTML 邮件转换为纯文本。

The current code (for multipart mails) looks like this:当前代码(用于多部分邮件)如下所示:

my $parser = new MIME::Parser;
my $entity = $parser->parse(\*STDIN) or die "parse failed\n";

for my $part ($entity->parts()) {
 if ($part->mime_type eq 'text/html') {
 my $bh = $part->bodyhandle;

 my $tree = HTML::TreeBuilder->new();
 $tree->utf8_mode();
 $tree->parse($bh->as_string);

 my $formatter = HTML::FormatText->new(leftmargin => 0, rightmargin => 72);
 my $txt = $formatter->format($tree);

 my $txtEntity=MIME::Entity->build(Data  => $txt,
                                 Type  => "text/plain",
                                 Encoding => "8bit"
                                 );

 $entity->add_part($txtEntity,0);
 }
}
$entity->print(\*STDOUT);

It works but it adds just adds the plain text part to the existing parts and doesn't replace the HTML part.它可以工作,但它只会将纯文本部分添加到现有部分,而不会替换 HTML 部分。
So I came up with this:所以我想出了这个:

my $head = $entity->head;

my $txtEntity=MIME::Entity->build(Data  => $txt,
                               Type  => "text/plain",
                               Encoding => "8bit",
                               From    => $head->get('From',0),
                               To      => $head->get('To',0),
                               Subject => $head->get('Subject',0),
                               Cc => $head->get('Cc',0)
                               );

$txtEntity->print(\*STDOUT);

But that could remove some parts of the email header.但这可能会删除电子邮件标题的某些部分。 Is there a function to replace the HTML body completely with the plain text one?是否有一种功能可以用纯文本完全替换 HTML 正文?

Thanks!谢谢!

If you don't have a way to replace the body instead of adding a new part, this might be a job for the formail utility (part of procmail) which can generate a new email with the headers of the old email, replacing the things you want to replace (like the encoding and content-type headers).如果您没有办法替换正文而不是添加新部分,这可能是 formail 实用程序(procmail 的一部分)的工作,它可以生成带有旧电子邮件标题的新电子邮件,替换内容您要替换(如编码和内容类型标头)。

Also, you might just try changing the encoding to text-plain.此外,您可以尝试将编码更改为纯文本。 You will still see the HTML code, but it will not render and you will also see your plain/text addition, though I grant this is a poor solution.您仍然会看到 HTML 代码,但它不会呈现,您还会看到纯文本/文本添加,尽管我承认这是一个糟糕的解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM