提取单词周围的单词并在数据框列中插入结果

Question

I have a dataframe, df , with 3 columns as follows: 我有一个数据帧， df ，有3列，如下所示：

company | year | text  
Apple   | 2016 |"The Company sells its products worldwide through its..."

I would like to search for "products" in df['text'] and extract the 3 words before and after "products" and insert the 3 words before and after into two columns in the data frame, df['before'] and df['after'] , respectively. 我想在df['text']搜索“products”并在“products”之前和之后提取3个单词，并在数据框中的两列之前和之后插入3个单词， df['before']和df['after'] ，分别。

This is what I have done so far: 这是我到目前为止所做的：

m = re.search(r'((?:\w+\W+){,3})(products)\W+((?:\w+\W+){,3})', df['text'])       
merge['searchText'])    
if m:
    l = [ x.strip().split() for x in m.groups()]
df['left'], df['right'] = l[0], l[2]

However, I am getting this message: 但是，我收到此消息：

TypeError: expected string or buffer TypeError：期望的字符串或缓冲区

How can I make it work? 我怎样才能使它工作？

Answer 1

Use pd.Series.str.extract 使用pd.Series.str.extract

pat = '(?P<before>(?:\w+\W+){,3})products\W+(?P<after>(?:\w+\W+){,3})'
new = df.text.str.extract(pat, expand=True)

new

               before                     after
0  Company sells its   worldwide through its...

You can create a new dataframe with new columns 您可以使用新列创建新数据框

df.assign(**new)

  company  year                                               text                     after              before
0   Apple  2016  The Company sells its products worldwide throu...  worldwide through its...  Company sells its

提取单词周围的单词并在数据框列中插入结果

问题描述

1 个解决方案

解决方案1
3 已采纳 2017-08-02 20:36:35

提取单词周围的单词并在数据框列中插入结果

问题描述

1 个解决方案

解决方案1 3 已采纳 2017-08-02 20:36:35

解决方案1
3 已采纳 2017-08-02 20:36:35