[英]Search for a partial string match in a data frame column from a list - Pandas - Python
[英]Partial keyword match not working when I am trying to create a new column from a pandas data frame in python?
我有一个数据框描述,如下所述
Description
我正在尝试对描述列进行关键字搜索,并且我将关键字列表作为列表。
我当前的代码只检查完全匹配而不是部分匹配。如果行中存在多个关键字,这些关键字将被分隔符分隔并填充新列。
我的代码
data=pd.read_excel('path_to_datafile.xlsx')
keywords=['dinner','government','Agents','entertainment','Agent']
keywords_lower = [item.lower() for item in keywords]
s=set(keywords_lower)
data['Keyword'] = data['Description'].apply(lambda x: '/'.join(set(x.lower().split()).intersection(s)))
如何才能做到这一点?
extractall
将完成这项工作,但您必须首先构建模式:
...
keywords_lower = [item.lower() for item in keywords]
pattern = '(' + '|'.join('(?:' + i + ')' for i in keywords_lower) + ')'
df['Keyword'] = df['Description'].str.extractall(pattern, re.I).groupby(level=0).agg('/'.join)
你会得到:
Description Keyword
0 Government entertainment people Govern/entertain
1 Dinner with CFO Dinner
2 Commission to Agents government Agent/govern
( pattern
在这里'((?:dinner)|(?:govern)|(?:agent)|(?:entertain))'
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.