简体   繁体   English

在 email 开始之前获取所有电子邮件和消息

[英]Get all the emails and the word just before the email starts

I am trying to parse my dataset to get all the emails and the word just before the email.我正在尝试解析我的数据集以获取 email 之前的所有电子邮件和单词。 For example if I have a row like this:例如,如果我有这样的一行:

sno                                                text
1        From: m.kro@b.org To: Cha.Sh@dys.com Hi my name is Sam and my email is samwise@gmail.com

Then I want to capture it as:然后我想将其捕获为:

sno                                                text                                              emails
1        From: m.kro@b.org To: Cha.Sh@dys.com Hi my name is Sam and my email is samwise@gmail.com    [From : m.kro@b.org ,To: Cha.Sh@dys.com, is samwise@gmail.com] 

My tried solution till now:到目前为止我尝试过的解决方案:

I have tried the "find_all" function to get all the emails, but I am having a problem getting the word just before the email starts.我已经尝试使用“find_all”function 来获取所有电子邮件,但是在 email 开始之前我遇到了问题。

df['Full Comments'].str.findall('(\S+@\S+)').str[0]

Any help on this is appreciated.对此的任何帮助表示赞赏。 Thank you.谢谢你。

Try:尝试:

pat = '([\w:]+ [\w\.]+@[\w\.]+)'

df['emails'] = df.text.str.extractall(pat).groupby(level=0)[0].agg(list)

Update : you can promote the word to column title with unstack :更新:您可以使用unstack将单词提升为列标题:

emails = (df.text.str.extractall(pat)
       .reset_index('match', drop=True)
       .set_index([0],append=True)[1]
       .unstack()
    )

df = df.join(emails) df = df.join(电子邮件)

Output (without the join part): Output(不含连接部分):

0       From:             To:                 is 
0  m.kro@b.org  Cha.Sh@dys.com  samwise@gmail.com

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 一个正则表达式模式,匹配所有以 s 开头的单词开始并在以 s 开头的单词之前停止的单词 - A regex pattern that matches all words starting from a word with an s and stopping before a word that starts with an s 在异步方法开始之前获取锁 - Acquiring a lock just before an async method starts 正则表达式灾难性的回溯; 提取单词以大写字母开头,然后是特定单词 - regex catastrophic backtracking ; extracting words starts with capital before the specific word pandas 删除特定单词之前的所有单词并获取该特定单词之后的前 n 个单词 - pandas remove all words before a specific word and get the first n words after that specific word 正则表达式:从指定单词到指定单词之前的匹配 - Regex: matching from a specified word to just before a specified word 正则表达式在给定单词之前和之后获取单词 - Regex to get word before and after given word python regex替换所有以“:”开头的单词的出现,下一个字符为字母 - python regex replace all occurances of word that starts with “:” and next character is letter 从 Outlook 读取所有电子邮件并将字数添加到 DataFrame - Read All Emails from Outlook and add the word counts to a DataFrame 在特定字符串开始python之前删除所有行 - Remove all lines before specific string starts python 如何从原始字符串中获取所有电子邮件 - How to get all emails from raw strings
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM