简体   繁体   中英

Parsing an email message body

I'm using the gmail API to parse through my gmail message body. It works other than when the body is in an html. Does anyone know how I can just extract the text within the email? If not, how I can just ignore emails with html?

Eventually I want to implement this for personal/professional emails in which there likely won't be html in it.

def message_converter(message_id):
        message = service.users().messages().get(userId='me', id=message_id,format='raw').execute()
        msg_str = str(base64.urlsafe_b64decode(message['raw'].encode('ASCII')),'UTF-8')
        mime_msg = email.message_from_string(msg_str)
        if mime_msg.is_multipart():
            for payload in mime_msg.get_payload():
                # if payload.is_multipart(): ...
                print (payload.get_payload())
        else:
            print (mime_msg.get_payload())

html2text does a pretty good job - it converts HTML into ASCII text.

You may need to do additional parsing/formatting after the fact, however.

i dont know if this can help you but Gmail Api have the same syntax so in C# you can find the text message in 3 places (it depends on the mail server) so :

msg.Payload.Parts[1].Body.Data;  // here you can find text message without HTML tag

msg.Payload.Parts[0].Body.Data; // here you can find text message with HTML tag

msg.Payload.Body.Data; // and here you dont have a choice you have the HTMl tag

This answer may help you do what you are heading to. I understand that you wanna get certain texts out of the body of the email. You may use regular expressions to do that. I made a video explaining how to get data out of Gmail email body but using Google App Script (JavaScript):

https://youtu.be/nI1OH3pAz6s?t=8

You download the code from GitHub link:

https://gist.github.com/MoayadAbuRmilah/5835369fdebbecf980029f7339e4d769

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM