I have one dataframe
#Around 100000 rows
df = pd.DataFrame({'text': [ 'Apple is healthy', 'Potato is round', 'Apple might be green'],
'category': ["","", ""],
})
A second dataframe
#Around 3000 rows
df_2 = pd.DataFrame({'keyword': [ 'Apple ', 'Potato'],
'category': ["fruit","vegetable"],
})
The required result
#Around 100000 rows
df = pd.DataFrame({'text': [ 'Apple is healthy', 'Potato is round', 'Apple might be green'],
'category': ["fruit","vegetable", "fruit"],
})
I tried this currently
df.set_index('text')
df_2.set_index('keyword')
df.update(df_2)
The result is
text category
Apple is healthy fruit
Potato is round vegetable
Apple might be green
AS you can see it does not add category for last row. How can I achieve that?
You need assign back output from DataFrame.set_index
, because not inplace operation like DataFrame.update
, for matching is used Series.str.extract
by column df_2["keyword"]
:
df = df.set_index(df['text'].str.extract(f'({"|".join(df_2["keyword"])})', expand=False))
df_2 = df_2.set_index('keyword')
print (df)
text category
text
Apple Apple is healthy
Potato Potato is round
Apple Apple might be green
df.update(df_2)
print (df)
text category
text
Apple Apple is healthy fruit
Potato Potato is round vegetable
Apple Apple might be green fruit
If need add only one column use Series.str.extract
with Series.map
:
s = df['text'].str.extract(f'({"|".join(df_2["keyword"])})', expand=False)
df['category'] = s.map(df_2.set_index(['keyword'])['category'])
print (df)
text category
0 Apple is healthy fruit
1 Potato is round vegetable
2 Apple might be green fruit
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.