I was trying to parse raw email data from a specific file path. But I am getting an error whenever I use file.readlines() for reading the file with email library. And, if I used file.read() it only parses the data from the first mail sent. How do I parse and analyze the raw mail data?
with open(file_path, "r") as file:
content = file.readlines()
email_to_string = email.message_from_string(content)
headers = email_to_string._headers
header_contents = {}
for header in headers:
if "From" in header:
header_contents['From'] = header[-1]
elif "To" in header:
header_contents['To'] = header[-1]
elif "Date" in header:
header_contents['Date'] = header [-1]
elif "Subject" in header:
header_contents['Subject'] = header[-1]
print("HEADER CONTENTS", header_contents)
if email_to_string.is_multipart():
body = []
for lines in body.get_payload():
body.append(lines)
body = " ".join(body)
else:
body = email_to_string.get_payload()
print("HEADER", headers)
print("HEADER CONTENTS", header_contents)
print("BODY", body)
**Error **
Traceback (most recent call last):
File "test.py", line 7, in <module>
email_to_string = email.message_from_string(content)
File "/usr/lib/python3.6/email/__init__.py", line 38, in message_from_string
return Parser(*args, **kws).parsestr(s)
File "/usr/lib/python3.6/email/parser.py", line 68, in parsestr
return self.parse(StringIO(text), headersonly=headersonly)
TypeError: initial_value must be str or None, not list
The method email.message_from_string() is expecting a string data type but file.readlines()
returns a list.
Try using file.read()
to return a string. Here's a link to its documentation.
with open(file_path, 'r') as file_:
content = file_.read().replace('\n', '')
email_to_string = email.message_from_string(content)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.