I'm using the gmail API to parse through my gmail message body. It works other than when the body is in an html. Does anyone know how I can just extract the text within the email? If not, how I can just ignore emails with html?
Eventually I want to implement this for personal/professional emails in which there likely won't be html in it.
def message_converter(message_id):
message = service.users().messages().get(userId='me', id=message_id,format='raw').execute()
msg_str = str(base64.urlsafe_b64decode(message['raw'].encode('ASCII')),'UTF-8')
mime_msg = email.message_from_string(msg_str)
if mime_msg.is_multipart():
for payload in mime_msg.get_payload():
# if payload.is_multipart(): ...
print (payload.get_payload())
else:
print (mime_msg.get_payload())
html2text does a pretty good job - it converts HTML into ASCII text.
You may need to do additional parsing/formatting after the fact, however.
i dont know if this can help you but Gmail Api have the same syntax so in C# you can find the text message in 3 places (it depends on the mail server) so :
msg.Payload.Parts[1].Body.Data; // here you can find text message without HTML tag
msg.Payload.Parts[0].Body.Data; // here you can find text message with HTML tag
msg.Payload.Body.Data; // and here you dont have a choice you have the HTMl tag
This answer may help you do what you are heading to. I understand that you wanna get certain texts out of the body of the email. You may use regular expressions to do that. I made a video explaining how to get data out of Gmail email body but using Google App Script (JavaScript):
https://youtu.be/nI1OH3pAz6s?t=8
You download the code from GitHub link:
https://gist.github.com/MoayadAbuRmilah/5835369fdebbecf980029f7339e4d769
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.