python：讀取內聯標簽txt文件並將其格式化為列

Question

我想對我的電子郵件進行統計分析。 為此，我選擇帶有Outlook的有趣電子郵件，然后將其保存為txt文件。

以下是您可以找到的樣本（或近似由於翻譯而來）：

 Send: monday 9 jully 2018 12:00 To: john doe Cc: sister doe; brother doe; mother doe Object: my data issue enclosed: data.pdf Send: monday 9 jully 2018 12:00 To: john doe Cc: sister doe; brother doe; mother doe Object: my data issue enclosed: data.pdf Send: monday 9 jully 2018 12:00 To: john doe Cc: sister doe; brother doe; mother doe Object: my data issue enclosed: data.pdf

顯然，要管理我的數據，最好將其分成幾列。 每封電子郵件的列標簽分別為{發送，收件人，抄送，對象，封閉}和一行。

我敢肯定，存在一種很好的方法來做到這一點，也許是對熊貓來說，但是我沒有使用好的關鍵字來找到有效的答案。

有什么小竅門幫助我嗎？

Answer 1

假設：

1）每個電子郵件信息集之間都有一個空行

2）在每個信息集中，您總是有5列（發送，到，抄送，對象，封閉），並且它們始終以相同的順序出現

3）沒有空數據（例如-所有電子郵件都有附件等）

input="""Send:     monday 9 jully 2018 12:00
To:       john doe
Cc:       sister doe; brother doe; mother doe
Object:   my data issue
enclosed: data.pdf

Send:     monday 9 jully 2018 12:00
To:       john doe
Cc:       sister doe; brother doe; mother doe
Object:   my data issue
enclosed: data.pdf

Send:     monday 9 jully 2018 12:00
To:       john doe
Cc:       sister doe; brother doe; mother doe
Object:   my data issue
enclosed: data.pdf"""

emails = input.split('\n\n')

output = list()

for email in emails:
    lines = email.split('\n')
    row=list()
    for line in lines:
        row.append(line.split(':')[1].strip())
    output.append(row)

print(output)

output將是列表的列表-在您的示例中為3行乘5列。 以后可以根據需要相對輕松地將其轉換為數據幀。

python：讀取內聯標簽txt文件並將其格式化為列

問題描述

1 個解決方案

解決方案1
0 2019-06-26 13:25:37

python：讀取內聯標簽txt文件並將其格式化為列

問題描述

1 個解決方案

解決方案1 0 2019-06-26 13:25:37

解決方案1
0 2019-06-26 13:25:37