简体   繁体   中英

How to remove everything after a certain keyword from email body using regex?

I wrote the below code to grab specific values. They include date, index value: SGEPSBSH, and bbg level from a particular email.

I'm trying to save it to pandas dataframe. Before saving the entire email body to dataframe, I am trying to remove everything after client signature starting from keyword "Regards".

I get the following error:

File "snapper.py", line 39, in <module>
    Body_content = message.body
File "__init__.py", line 473, in __getattr__
    raise AttributeError("'%s' object has no attribute '%s'" % (repr(self), attr))
AttributeError: '<Library._MailItem instance at 0x2473706520480>' 
                object has no attribute 'body'

Can you please help to fix my code?

import win32com.client
import re
import os
import pandas
import datetime
from datetime import date

EMAIL_ACCOUNT = 'atul.sanwal@ihsmarkit.com'
EMAIL_SUBJ_SEARCH_STRING = 'SGEPSBSH Index Level'
EMAIL_CONTNT = {'Ticker': [], 'TickerLevel': [], 'DATE': []}
out_app = win32com.client.gencache.EnsureDispatch("Outlook.Application")
out_namespace = out_app.GetNamespace("MAPI")

root_folder = out_namespace.GetDefaultFolder(6)
out_iter_folder = root_folder.Folders['Email_snapper']
char_length_of_search_substring = len(EMAIL_SUBJ_SEARCH_STRING)
item_count = out_iter_folder.Items.Count
Flag = False
cnt = 0
if out_iter_folder.Items.Count > 0:
    for i in range(item_count, 0, -1)[:2]:
        message = out_iter_folder.Items[i]
        #message = message.Restrict("[ReceivedTime] >= '" + lastWeekDateTime + "'")
Body_content = message.body
message.body = re.sub(r".*Regards[^\n]+\n[^\n]+", "",message.body)
print(Body_content)

在此处输入图像描述

If you are not set on using regex, a simple string slicing might work for you as well

s = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus tincidunt elit in ex " \
    "molestie euismod sed et velit. Aenean blandit placerat sodales. Curabitur mattis nibh nec " \
    "leo hendrerit commodo. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras eu " \
    "mattis dui, at convallis dolor."
s = s[:s.find("amet")].strip()
print(s)

Out:

Lorem ipsum dolor sit 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM