I am trying to parse my dataset to get all the emails and the word just before the email. For example if I have a row like this:
sno text
1 From: m.kro@b.org To: Cha.Sh@dys.com Hi my name is Sam and my email is samwise@gmail.com
Then I want to capture it as:
sno text emails
1 From: m.kro@b.org To: Cha.Sh@dys.com Hi my name is Sam and my email is samwise@gmail.com [From : m.kro@b.org ,To: Cha.Sh@dys.com, is samwise@gmail.com]
My tried solution till now:
I have tried the "find_all" function to get all the emails, but I am having a problem getting the word just before the email starts.
df['Full Comments'].str.findall('(\S+@\S+)').str[0]
Any help on this is appreciated. Thank you.
Try:
pat = '([\w:]+ [\w\.]+@[\w\.]+)'
df['emails'] = df.text.str.extractall(pat).groupby(level=0)[0].agg(list)
Update : you can promote the word to column title with unstack
:
emails = (df.text.str.extractall(pat)
.reset_index('match', drop=True)
.set_index([0],append=True)[1]
.unstack()
)
df = df.join(emails)
Output (without the join part):
0 From: To: is
0 m.kro@b.org Cha.Sh@dys.com samwise@gmail.com
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.