[英]Extract words surrounding a word and inserting results in a dataframe column
I have a dataframe, df
, with 3 columns as follows: 我有一个数据帧,
df
,有3列,如下所示:
company | year | text
Apple | 2016 |"The Company sells its products worldwide through its..."
I would like to search for "products" in df['text']
and extract the 3 words before and after "products" and insert the 3 words before and after into two columns in the data frame, df['before']
and df['after']
, respectively. 我想在
df['text']
搜索“products”并在“products”之前和之后提取3个单词,并在数据框中的两列之前和之后插入3个单词, df['before']
和df['after']
,分别。
This is what I have done so far: 这是我到目前为止所做的:
m = re.search(r'((?:\w+\W+){,3})(products)\W+((?:\w+\W+){,3})', df['text'])
merge['searchText'])
if m:
l = [ x.strip().split() for x in m.groups()]
df['left'], df['right'] = l[0], l[2]
However, I am getting this message: 但是,我收到此消息:
TypeError: expected string or buffer
TypeError:期望的字符串或缓冲区
How can I make it work? 我怎样才能使它工作?
Use pd.Series.str.extract
使用
pd.Series.str.extract
pat = '(?P<before>(?:\w+\W+){,3})products\W+(?P<after>(?:\w+\W+){,3})'
new = df.text.str.extract(pat, expand=True)
new
before after
0 Company sells its worldwide through its...
You can create a new dataframe with new columns 您可以使用新列创建新数据框
df.assign(**new)
company year text after before
0 Apple 2016 The Company sells its products worldwide throu... worldwide through its... Company sells its
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.