简体   繁体   English

仅获取 MIME 电子邮件的“文本/纯文本”位

[英]Get just the 'text/plain' bit of a MIME email

I'm working with parsing emails for a project i'm working on.我正在为我正在处理的项目解析电子邮件。 so far I connect to a pop3 mail server, download all of the mail thats there and loop through it getting the sender, subject and body.到目前为止,我连接到一个 pop3 邮件服务器,下载那里的所有邮件并通过它循环获取发件人、主题和正文。

I then decode the base64 body, which leaves me with a multi-part MIME message, like the following test email i sent myself...然后我解码 base64 正文,这给我留下了一个多部分 MIME 消息,就像我自己发送的以下测试电子邮件......

多部分 MIME 电子邮件

I need to be able to split this Multipart MIME email body so that I can have one string which contains just the plain text version of the mail and another string which contains the html part.我需要能够拆分此 Multipart MIME 电子邮件正文,以便我可以拥有一个仅包含邮件纯文本版本的字符串和另一个包含 html 部分的字符串。

I'm not interested in anything else the mail might have... attachments and suchlike can all get dropped.我对邮件可能有的其他任何东西都不感兴趣……附件之类的东西都可能被丢弃。

Can anyone point me in the right direction?任何人都可以指出我正确的方向吗?

If i'm going to be looking at using a 3rd party control, does anyone know of anything freeware that would be able to do this?如果我打算使用 3rd 方控件,有没有人知道任何能够做到这一点的免费软件? I would never need to encode, just decode.我永远不需要编码,只需解码。

Assuming you have the headers in the email which you have extracted so that you can get the string used to identify the part boundaries in the email, you can get some way through the parsing with code like this:假设您在电子邮件中有已提取的标题,以便您可以获得用于识别电子邮件中部分边界的字符串,您可以使用如下代码进行解析:

Imports System.IO
Imports System.Text.RegularExpressions

Module Module1

    Sub Main()
        Dim sampleEmail = File.ReadAllText("C:\temp\SampleEmail.eml")
        Dim getBoundary As New Regex("boundary=(.*?)\r\n")
        Dim possibleBoundary = getBoundary.Matches(sampleEmail)
        Dim boundary = ""
        If possibleBoundary.Count = 0 Then
            Console.WriteLine("Could not find boundary specifier.")
            End
        End If

        ' the boundary string may or may not be surrounded by double-quotes
        boundary = possibleBoundary(0).Groups(1).Value.Trim(CChar(""""))

        Console.WriteLine(boundary)

        boundary = vbCrLf & "--" & boundary
        Dim parts = Regex.Split(sampleEmail, Regex.Escape(boundary))

        Console.WriteLine("Number of parts: " & parts.Count.ToString())

        ' save the parts to one text file for inspection
        Using sw As New StreamWriter("C:\temp\EmailParts.txt")
            For i = 0 To parts.Count - 1
                ' this is where you would find the part with "Content-Type: text/plain;" -
                ' you may also need to look at the charset, e.g. charset="utf-8"
                sw.WriteLine("PART " & i.ToString())
                sw.WriteLine(parts(i))
            Next
        End Using

        Console.ReadLine()

    End Sub

End Module

The email I used to test that did not have any base-64 encoding involved.我用来测试的电子邮件没有涉及任何 base-64 编码。

I would recommend using my free/open source MimeKit library to accomplish this task as opposed to using a regex solution.我建议使用我的免费/开源MimeKit库来完成此任务,而不是使用正则表达式解决方案。

I don't really know VB.NET, so the following code snippet might not be quite right (I'm a C# guy), but it should give you the general idea of how to accomplish the task you want:我不太了解 VB.NET,所以下面的代码片段可能不太正确(我是 C# 人),但它应该让你大致了解如何完成你想要的任务:

Dim message = MimeMessage.Load ("C:\email.msg");
Dim html = message.HtmlBody;
Dim text = message.TextBody;

As you can see, MimeKit makes this sort of thing extremely trivial.如您所见,MimeKit 使这种事情变得非常简单。

A = E1 = 80 = A = E1 = 80 =

= B8 = E1 = 80 = 80 = E1 = 80 = BC = E1 = 80-8A = E1 = 80 = BA; = B8 = E1 = 80 = 80 = E1 = 80 = BC = E1 = 80-8A = E1 = 80 = BA; = 50 = 61 = 74 = 69 = 65 = 6E = 74 ;; = 50 = 61 = 74 = 69 = 65 = 6E = 74;;

-PRINTABLE: = 50 = 61 = 74 = 69 = 65-6 E = 74 = 20 = E1 = 80 = 99 = E1 = 80 = 81 = E1 = 80 = 84 = E1 = 80 = BA = -可打印:= 50 = 61 = 74 = 69 = 65-6 E = 74 = 20 = E1 = 80 = 99 = E1 = 80 = 81 = E1 = 80 = 84 = E1 = 80 = BA =

E1 = 81 = 80 = E1 = 81 = 80 =

= E1 = 80 = 84 = E1 = 80 = BA = E1 = 80 = B8 = E1 = 80 = 80 = E1 = 80 = BC = E1 = 80 = 8A = E1 = 80 = BA = E1 = 80 = 84 = E1 = 80 = BA = E1 = 80 = B8 = E1 = 80 = 80 = E1 = 80 = BC = E1 = 80 = 8A = E1 = 80 = BA

B = E1 = 80 = AD = E1 = 80 = AF; B = E1 = 80 = AD = E1 = 80 = AF; = 50 = 61 = 74 = 69 = 65 = = 50 = 61 = 74 = 69 = 65 =

6E = 74 ;; 6E = 74;;

E1 = 80 = AF = E1 = 80 = AF =

END: VCARD结束: VCARD

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM