[英]Setting value of columns in one dataframe to another dataframe column based on condition
[英]Setting value for dataframe column from another dataframe based on condition
我有一個數據框
#Around 100000 rows
df = pd.DataFrame({'text': [ 'Apple is healthy', 'Potato is round', 'Apple might be green'],
'category': ["","", ""],
})
第二個數據框
#Around 3000 rows
df_2 = pd.DataFrame({'keyword': [ 'Apple ', 'Potato'],
'category': ["fruit","vegetable"],
})
需要的結果
#Around 100000 rows
df = pd.DataFrame({'text': [ 'Apple is healthy', 'Potato is round', 'Apple might be green'],
'category': ["fruit","vegetable", "fruit"],
})
我目前試過這個
df.set_index('text')
df_2.set_index('keyword')
df.update(df_2)
結果是
text category
Apple is healthy fruit
Potato is round vegetable
Apple might be green
如您所見,它沒有為最后一行添加類別。 我怎樣才能做到這一點?
您需要從DataFrame.set_index
分配回輸出,因為不是像DataFrame.update
這樣的就地操作,用於匹配的Series.str.extract
按列df_2["keyword"]
:
df = df.set_index(df['text'].str.extract(f'({"|".join(df_2["keyword"])})', expand=False))
df_2 = df_2.set_index('keyword')
print (df)
text category
text
Apple Apple is healthy
Potato Potato is round
Apple Apple might be green
df.update(df_2)
print (df)
text category
text
Apple Apple is healthy fruit
Potato Potato is round vegetable
Apple Apple might be green fruit
如果只需要添加一列,請使用Series.str.extract
和Series.map
:
s = df['text'].str.extract(f'({"|".join(df_2["keyword"])})', expand=False)
df['category'] = s.map(df_2.set_index(['keyword'])['category'])
print (df)
text category
0 Apple is healthy fruit
1 Potato is round vegetable
2 Apple might be green fruit
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.