I have a data frame Description as mentioned below
Description
I am trying to do a keyword search on the description column and I have list of keywords as a list .
My current code checks only exact matches not partial matches.If there are multiple keywords present in the row these will be separated by a delimiter and populated new column.
My code
data=pd.read_excel('path_to_datafile.xlsx')
keywords=['dinner','government','Agents','entertainment','Agent']
keywords_lower = [item.lower() for item in keywords]
s=set(keywords_lower)
data['Keyword'] = data['Description'].apply(lambda x: '/'.join(set(x.lower().split()).intersection(s)))
How can this be done?
extractall
will do the job, but you must first build the pattern:
...
keywords_lower = [item.lower() for item in keywords]
pattern = '(' + '|'.join('(?:' + i + ')' for i in keywords_lower) + ')'
df['Keyword'] = df['Description'].str.extractall(pattern, re.I).groupby(level=0).agg('/'.join)
You would get:
Description Keyword
0 Government entertainment people Govern/entertain
1 Dinner with CFO Dinner
2 Commission to Agents government Agent/govern
( pattern
is here '((?:dinner)|(?:govern)|(?:agent)|(?:entertain))'
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.