简体   繁体   中英

Get just the 'text/plain' bit of a MIME email

I'm working with parsing emails for a project i'm working on. so far I connect to a pop3 mail server, download all of the mail thats there and loop through it getting the sender, subject and body.

I then decode the base64 body, which leaves me with a multi-part MIME message, like the following test email i sent myself...

多部分 MIME 电子邮件

I need to be able to split this Multipart MIME email body so that I can have one string which contains just the plain text version of the mail and another string which contains the html part.

I'm not interested in anything else the mail might have... attachments and suchlike can all get dropped.

Can anyone point me in the right direction?

If i'm going to be looking at using a 3rd party control, does anyone know of anything freeware that would be able to do this? I would never need to encode, just decode.

Assuming you have the headers in the email which you have extracted so that you can get the string used to identify the part boundaries in the email, you can get some way through the parsing with code like this:

Imports System.IO
Imports System.Text.RegularExpressions

Module Module1

    Sub Main()
        Dim sampleEmail = File.ReadAllText("C:\temp\SampleEmail.eml")
        Dim getBoundary As New Regex("boundary=(.*?)\r\n")
        Dim possibleBoundary = getBoundary.Matches(sampleEmail)
        Dim boundary = ""
        If possibleBoundary.Count = 0 Then
            Console.WriteLine("Could not find boundary specifier.")
            End
        End If

        ' the boundary string may or may not be surrounded by double-quotes
        boundary = possibleBoundary(0).Groups(1).Value.Trim(CChar(""""))

        Console.WriteLine(boundary)

        boundary = vbCrLf & "--" & boundary
        Dim parts = Regex.Split(sampleEmail, Regex.Escape(boundary))

        Console.WriteLine("Number of parts: " & parts.Count.ToString())

        ' save the parts to one text file for inspection
        Using sw As New StreamWriter("C:\temp\EmailParts.txt")
            For i = 0 To parts.Count - 1
                ' this is where you would find the part with "Content-Type: text/plain;" -
                ' you may also need to look at the charset, e.g. charset="utf-8"
                sw.WriteLine("PART " & i.ToString())
                sw.WriteLine(parts(i))
            Next
        End Using

        Console.ReadLine()

    End Sub

End Module

The email I used to test that did not have any base-64 encoding involved.

I would recommend using my free/open source MimeKit library to accomplish this task as opposed to using a regex solution.

I don't really know VB.NET, so the following code snippet might not be quite right (I'm a C# guy), but it should give you the general idea of how to accomplish the task you want:

Dim message = MimeMessage.Load ("C:\email.msg");
Dim html = message.HtmlBody;
Dim text = message.TextBody;

As you can see, MimeKit makes this sort of thing extremely trivial.

A = E1 = 80 =

= B8 = E1 = 80 = 80 = E1 = 80 = BC = E1 = 80-8A = E1 = 80 = BA; = 50 = 61 = 74 = 69 = 65 = 6E = 74 ;;

-PRINTABLE: = 50 = 61 = 74 = 69 = 65-6 E = 74 = 20 = E1 = 80 = 99 = E1 = 80 = 81 = E1 = 80 = 84 = E1 = 80 = BA =

E1 = 81 = 80 =

= E1 = 80 = 84 = E1 = 80 = BA = E1 = 80 = B8 = E1 = 80 = 80 = E1 = 80 = BC = E1 = 80 = 8A = E1 = 80 = BA

B = E1 = 80 = AD = E1 = 80 = AF; = 50 = 61 = 74 = 69 = 65 =

6E = 74 ;;

E1 = 80 = AF =

END: VCARD

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM