Setting value for dataframe column from another dataframe based on condition

Question

I have one dataframe

#Around 100000 rows
df = pd.DataFrame({'text':    [ 'Apple is healthy',  'Potato is round', 'Apple might be green'],
                   'category': ["","", ""],
                   })

A second dataframe

#Around 3000 rows
df_2 = pd.DataFrame({'keyword':    [ 'Apple ',  'Potato'],
                   'category': ["fruit","vegetable"],
                   })

The required result

#Around 100000 rows
df = pd.DataFrame({'text':    [ 'Apple is healthy',  'Potato is round', 'Apple might be green'],
                   'category': ["fruit","vegetable", "fruit"],
                   })

I tried this currently

df.set_index('text')
df_2.set_index('keyword')
df.update(df_2)

The result is

text    category
Apple is healthy    fruit
Potato is round vegetable
Apple might be green

AS you can see it does not add category for last row. How can I achieve that?

Answer 1

You need assign back output from DataFrame.set_index , because not inplace operation like DataFrame.update , for matching is used Series.str.extract by column df_2["keyword"] :

df = df.set_index(df['text'].str.extract(f'({"|".join(df_2["keyword"])})', expand=False))
df_2 = df_2.set_index('keyword')
print (df)
                        text category
text                                 
Apple       Apple is healthy         
Potato       Potato is round         
Apple   Apple might be green  



df.update(df_2)
print (df)
                        text   category
text                                   
Apple       Apple is healthy      fruit
Potato       Potato is round  vegetable
Apple   Apple might be green      fruit

If need add only one column use Series.str.extract with Series.map :

s = df['text'].str.extract(f'({"|".join(df_2["keyword"])})', expand=False)
df['category'] = s.map(df_2.set_index(['keyword'])['category'])
print (df)
                   text   category
0      Apple is healthy      fruit
1       Potato is round  vegetable
2  Apple might be green      fruit

Setting value for dataframe column from another dataframe based on condition

Question

1 answers

solution1
0 ACCPTED 2020-09-16 10:08:32

Setting value for dataframe column from another dataframe based on condition

Question

1 answers

solution1 0 ACCPTED 2020-09-16 10:08:32

solution1
0 ACCPTED 2020-09-16 10:08:32