python: reading an inline label txt file and formating it to columns

Question

I want to perform stats analysis on my emails. To do that, I select my interesting emails with outlook and then I can save it in a txt file.

here is a sample of what you can find (or approximately due to translation):

 Send: monday 9 jully 2018 12:00 To: john doe Cc: sister doe; brother doe; mother doe Object: my data issue enclosed: data.pdf Send: monday 9 jully 2018 12:00 To: john doe Cc: sister doe; brother doe; mother doe Object: my data issue enclosed: data.pdf Send: monday 9 jully 2018 12:00 To: john doe Cc: sister doe; brother doe; mother doe Object: my data issue enclosed: data.pdf

Clearly, to manage my data, it would have been better if it was shaped in columns. Columns labels {Send,To,Cc, Object, Enclosed} and one row for each email.

I'm sure it exist an elegant way to do that, perhaps with pandas, but I'm not using good keywords to find effective answers.

Any tip to hep me ?

Answer 1

Assuming:

1) you have an empty line between each of the information sets of emails

2) within each information set you always have 5 columns (send, to, cc, object, enclosed) and they always appear in the same sequence

3) no empty data (for example - all emails have attachments, etc.)

input="""Send:     monday 9 jully 2018 12:00
To:       john doe
Cc:       sister doe; brother doe; mother doe
Object:   my data issue
enclosed: data.pdf

Send:     monday 9 jully 2018 12:00
To:       john doe
Cc:       sister doe; brother doe; mother doe
Object:   my data issue
enclosed: data.pdf

Send:     monday 9 jully 2018 12:00
To:       john doe
Cc:       sister doe; brother doe; mother doe
Object:   my data issue
enclosed: data.pdf"""

emails = input.split('\n\n')

output = list()

for email in emails:
    lines = email.split('\n')
    row=list()
    for line in lines:
        row.append(line.split(':')[1].strip())
    output.append(row)

print(output)

output will be a list of lists - 3 rows by 5 columns in your example. It can be later converted to a dataframe relatively easily when necessary.

python: reading an inline label txt file and formating it to columns

Question

1 answers

solution1
0 2019-06-26 13:25:37

python: reading an inline label txt file and formating it to columns

Question

1 answers

solution1 0 2019-06-26 13:25:37

solution1
0 2019-06-26 13:25:37