简体   繁体   中英

python: reading an inline label txt file and formating it to columns

I want to perform stats analysis on my emails. To do that, I select my interesting emails with outlook and then I can save it in a txt file.

here is a sample of what you can find (or approximately due to translation):

 Send: monday 9 jully 2018 12:00 To: john doe Cc: sister doe; brother doe; mother doe Object: my data issue enclosed: data.pdf Send: monday 9 jully 2018 12:00 To: john doe Cc: sister doe; brother doe; mother doe Object: my data issue enclosed: data.pdf Send: monday 9 jully 2018 12:00 To: john doe Cc: sister doe; brother doe; mother doe Object: my data issue enclosed: data.pdf 

Clearly, to manage my data, it would have been better if it was shaped in columns. Columns labels {Send,To,Cc, Object, Enclosed} and one row for each email.

I'm sure it exist an elegant way to do that, perhaps with pandas, but I'm not using good keywords to find effective answers.

Any tip to hep me ?

Assuming:

1) you have an empty line between each of the information sets of emails

2) within each information set you always have 5 columns (send, to, cc, object, enclosed) and they always appear in the same sequence

3) no empty data (for example - all emails have attachments, etc.)

input="""Send:     monday 9 jully 2018 12:00
To:       john doe
Cc:       sister doe; brother doe; mother doe
Object:   my data issue
enclosed: data.pdf

Send:     monday 9 jully 2018 12:00
To:       john doe
Cc:       sister doe; brother doe; mother doe
Object:   my data issue
enclosed: data.pdf

Send:     monday 9 jully 2018 12:00
To:       john doe
Cc:       sister doe; brother doe; mother doe
Object:   my data issue
enclosed: data.pdf"""

emails = input.split('\n\n')

output = list()

for email in emails:
    lines = email.split('\n')
    row=list()
    for line in lines:
        row.append(line.split(':')[1].strip())
    output.append(row)

print(output)

output will be a list of lists - 3 rows by 5 columns in your example. It can be later converted to a dataframe relatively easily when necessary.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM