简体   繁体   中英

Partial keyword match not working when I am trying to create a new column from a pandas data frame in python?

I have a data frame Description as mentioned below

  Description

I am trying to do a keyword search on the description column and I have list of keywords as a list .

My current code checks only exact matches not partial matches.If there are multiple keywords present in the row these will be separated by a delimiter and populated new column.

My code

data=pd.read_excel('path_to_datafile.xlsx')
keywords=['dinner','government','Agents','entertainment','Agent']
keywords_lower = [item.lower() for item in keywords]
s=set(keywords_lower)
data['Keyword'] = data['Description'].apply(lambda x: '/'.join(set(x.lower().split()).intersection(s)))

How can this be done?

extractall will do the job, but you must first build the pattern:

...
keywords_lower = [item.lower() for item in keywords]
pattern = '(' + '|'.join('(?:' + i + ')' for i in keywords_lower) + ')'
df['Keyword'] = df['Description'].str.extractall(pattern, re.I).groupby(level=0).agg('/'.join)

You would get:

                       Description           Keyword
0  Government entertainment people  Govern/entertain
1                  Dinner with CFO            Dinner
2  Commission to Agents government      Agent/govern

( pattern is here '((?:dinner)|(?:govern)|(?:agent)|(?:entertain))' )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM