简体   繁体   English

当我尝试从 python 中的 Pandas 数据框创建新列时,部分关键字匹配不起作用?

[英]Partial keyword match not working when I am trying to create a new column from a pandas data frame in python?

I have a data frame Description as mentioned below我有一个数据框描述,如下所述

  Description

I am trying to do a keyword search on the description column and I have list of keywords as a list .我正在尝试对描述列进行关键字搜索,并且我将关键字列表作为列表。

My current code checks only exact matches not partial matches.If there are multiple keywords present in the row these will be separated by a delimiter and populated new column.我当前的代码只检查完全匹配而不是部分匹配。如果行中存在多个关键字,这些关键字将被分隔符分隔并填充新列。

My code我的代码

data=pd.read_excel('path_to_datafile.xlsx')
keywords=['dinner','government','Agents','entertainment','Agent']
keywords_lower = [item.lower() for item in keywords]
s=set(keywords_lower)
data['Keyword'] = data['Description'].apply(lambda x: '/'.join(set(x.lower().split()).intersection(s)))

How can this be done?如何才能做到这一点?

extractall will do the job, but you must first build the pattern: extractall将完成这项工作,但您必须首先构建模式:

...
keywords_lower = [item.lower() for item in keywords]
pattern = '(' + '|'.join('(?:' + i + ')' for i in keywords_lower) + ')'
df['Keyword'] = df['Description'].str.extractall(pattern, re.I).groupby(level=0).agg('/'.join)

You would get:你会得到:

                       Description           Keyword
0  Government entertainment people  Govern/entertain
1                  Dinner with CFO            Dinner
2  Commission to Agents government      Agent/govern

( pattern is here '((?:dinner)|(?:govern)|(?:agent)|(?:entertain))' ) pattern在这里'((?:dinner)|(?:govern)|(?:agent)|(?:entertain))'

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从列表中的数据框列中搜索部分字符串匹配 - Pandas - Python - Search for a partial string match in a data frame column from a list - Pandas - Python 尝试解析字符串并在Python Pandas的数据框中创建新列 - Trying to parse string and create new columns in data frame in Python pandas 根据不同列的部分字符串匹配在新数据框列中创建标签 - create labels in a new data frame column based on partial string match of a different column Pandas:在数据框中创建一个新列,其中的值是从现有列 i 计算出来的。 计算最大值 - Pandas: Create a new column in a data frame with values calculated from an already existing column, i. calculate maximum 如何通过从另一列中的句子中提取单词来在 pandas 数据框中创建一个新列? - How can I create a new column in a pandas data frame by extracting words from sentences in another column? 数据框:列表中的行内容部分文本匹配,创建新列 - Data Frame: row content partial text match in a list , create new column 尝试使用 Pandas 数据框中其他两列的 groupby 基于另一列创建新的滚动平均列时出错 - Error when trying to create new rolling average column based on another column using groupby of two other columns in pandas data frame 使用 Pandas 从现有列创建新列到数据框 - Create a new column to data frame from existing columns using Pandas 如何从 pandas 数据框的列值创建新行 - How to create a new rows from column values of pandas data frame Python,Pandas从数据框创建新数据 - Python, Pandas from data frame to create new data
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM