简体   繁体   English

提取单词周围的单词并在数据框列中插入结果

[英]Extract words surrounding a word and inserting results in a dataframe column

I have a dataframe, df , with 3 columns as follows: 我有一个数据帧, df ,有3列,如下所示:

company | year | text  
Apple   | 2016 |"The Company sells its products worldwide through its..."  

I would like to search for "products" in df['text'] and extract the 3 words before and after "products" and insert the 3 words before and after into two columns in the data frame, df['before'] and df['after'] , respectively. 我想在df['text']搜索“products”并在“products”之前和之后提取3个单词,并在数据框中的两列之前和之后插入3个单词, df['before']df['after'] ,分别。

This is what I have done so far: 这是我到目前为止所做的:

m = re.search(r'((?:\w+\W+){,3})(products)\W+((?:\w+\W+){,3})', df['text'])       
merge['searchText'])    
if m:
    l = [ x.strip().split() for x in m.groups()]
df['left'], df['right'] = l[0], l[2]  

However, I am getting this message: 但是,我收到此消息:

TypeError: expected string or buffer TypeError:期望的字符串或缓冲区

How can I make it work? 我怎样才能使它工作?

Use pd.Series.str.extract 使用pd.Series.str.extract

pat = '(?P<before>(?:\w+\W+){,3})products\W+(?P<after>(?:\w+\W+){,3})'
new = df.text.str.extract(pat, expand=True)

new

               before                     after
0  Company sells its   worldwide through its...

You can create a new dataframe with new columns 您可以使用新列创建新数据框

df.assign(**new)

  company  year                                               text                     after              before
0   Apple  2016  The Company sells its products worldwide throu...  worldwide through its...  Company sells its 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM