简体   繁体   中英

Golang: Encountering issue trying to parse Outlook msg files (emails) malformed MIME header: missing colon:

I have a task to parse both eml and msg formatted email files using Go. There's a wonderful package for parsing EML files, however, with MSG, no matter what package I research and attempt to implement, I encounter the same error every single time.

malformed MIME header: missing colon:

It isn't the msg file itself. I have the same service in .NET which reads the msg file perfectly (MsgReader library).

Could someone suggest a package I could use in Go to read msg files? I wonder if it's an encoding issue (this wasn't a problem with eml files).

I've tried using these packages:

As an example, here is one function I've tried to read an msg file.

func parse_msg_file() {

    var filePath string = "c://messages//kraken.msg"
    var reader io.Reader

    f, err := os.Open(filePath)
    checkerr(err, "file "+filePath+" not found or can not be readed")

    defer f.Close()

    reader = bufio.NewReader(f)

    msg, err := email.ParseMessage(reader)
    checkerr(err, "failed to parse raw msg file")
    if msg == nil {
        checkerr(err, "failed to parse raw msg file")
    }
}

and the output when I call the function is:

malformed MIME header: missing colon: "\xd0\xcf\x11\u0871\x1a\xe1\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00>\x00\x03\x00\xfe\xff\t\x00\x06\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\t\x00\x00\x00\x02\x00\x00\x00\xfe\xff\xff\xff\x00\x00\x00\x00\x03\x00\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xffR\x00o\x00o\x00t\x00 \x00E\x00n\x00t\x00r\x00y\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x16\x00\x05\x00\xff\xff\xff\xff\xff\xff\xff\xff\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xf0\t-0r$\xd9\x01"
exit status 255

Just to add to my comment, I have searched for "msg parsers in go" in Google and it has brought up this repository - https://github.com/oucema001/OutlookMessageParser-Go . I don't know if it actually works - it's pretty old, and no documentation, so unlikely it'll be easy to use, but you can start from there.

The Compound File Binary File Format is

a general-purpose file format that provides a file-system-like structure within a file for the storage of arbitrary, application-specific streams of data.

I believe that this stuff all came from Microsoft's old OLE/COM stuff (Object Linking and Embedding/Component Object Model).

FWIW, here's a parser for the Compound File Binary File Format. No idea if it works, or anything else about it, but it might be, at least, a jumping-off point for you.

https://github.com/richardlehane/mscfb

[Edited to note]

Seems that the above package is a dependency of https://github.com/oucema001/OutlookMessageParser-Go , referenced in this answer by @astax .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM