Get all the emails and the word just before the email starts

Question

I am trying to parse my dataset to get all the emails and the word just before the email. For example if I have a row like this:

sno                                                text
1        From: m.kro@b.org To: Cha.Sh@dys.com Hi my name is Sam and my email is samwise@gmail.com

Then I want to capture it as:

sno                                                text                                              emails
1        From: m.kro@b.org To: Cha.Sh@dys.com Hi my name is Sam and my email is samwise@gmail.com    [From : m.kro@b.org ,To: Cha.Sh@dys.com, is samwise@gmail.com]

My tried solution till now:

I have tried the "find_all" function to get all the emails, but I am having a problem getting the word just before the email starts.

df['Full Comments'].str.findall('(\S+@\S+)').str[0]

Any help on this is appreciated. Thank you.

Answer 1

Try:

pat = '([\w:]+ [\w\.]+@[\w\.]+)'

df['emails'] = df.text.str.extractall(pat).groupby(level=0)[0].agg(list)

Update : you can promote the word to column title with unstack :

emails = (df.text.str.extractall(pat)
       .reset_index('match', drop=True)
       .set_index([0],append=True)[1]
       .unstack()
    )

df = df.join(emails)

Output (without the join part):

0       From:             To:                 is 
0  m.kro@b.org  Cha.Sh@dys.com  samwise@gmail.com

Get all the emails and the word just before the email starts

Question

1 answers

solution1
1 ACCPTED 2020-12-21 19:56:02

Get all the emails and the word just before the email starts

Question

1 answers

solution1 1 ACCPTED 2020-12-21 19:56:02

solution1
1 ACCPTED 2020-12-21 19:56:02