I have two dataframes such as: df1:
Category Keywords
0 Fruit ['apple', 'pear', 'plum', 'grape']
1 Color ['red', 'purple', 'green']
df2:
Items
0 plum
1 purple
2 pear
3 orange
4 apple
5 rainbow
whenever I find any values in df2 from the keyword list of df1, I want to MOVE the found values into new list or dataframe; which means the values are taken from df2 and moved to df3. The results will be as follows:
df2:
Items
0 orange
1 rainbow
df3:
Items
0 plum
1 purple
2 pear
3 apple
or list of items as [plum, purple, pear, apple]
A similar but not exact question would be: Use keywords from dataframe to detect if any present in another dataframe or string
EDIT: items such as "pears" or "pearl" should still be identified for the keyword "pear"
items_list = df1['Keywords'].tolist()
items_list = [item for sub_list in items_list for item in sub_list]
df3 = df2.loc[~df2['Items'].isin(items_list)]
df2 = df2.loc[df2['Items'].isin(items_list)]
You can use str.contains() and check for a regex with |
. Also, I am using explode() to convert the keyword to a list.
import pandas as pd
c = ['Category','Keywords']
d = [['Fruit',['apple', 'pear', 'plum', 'grape']],
['Color',['red', 'purple', 'green']]]
df1 = pd.DataFrame(d,columns=c)
df2 = pd.DataFrame({'Items':['plum','purple','pear','orange',
'apple','rainbow','pearl','pears',
'peary','pineapple','plumber']})
print (df1)
print (df2)
keywords = df1.Keywords.explode().explode().to_list()
key_dict = r'({})'.format('|'.join(keywords))
mask = df2.Items.str.contains(key_dict)
df3 = df2[mask]
df2 = df2[~mask]
print (df2)
print (df3)
This will give you:
Original df1:
Category Keywords
0 Fruit [apple, pear, plum, grape]
1 Color [red, purple, green]
Original df2:
Items
0 plum
1 purple
2 pear
3 orange
4 apple
5 rainbow
6 pearl
7 pears
8 peary
9 pineapple
10 plumber
New df3: contains all the items that were part of the keyword
Items
0 plum
1 purple
2 pear
4 apple
6 pearl
7 pears
8 peary
9 pineapple
10 plumber
Updated df2:
Items
3 orange
5 rainbow
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.