簡體   English   中英

根據條件從另一個數據框設置數據框列的值

[英]Setting value for dataframe column from another dataframe based on condition

我有一個數據框

#Around 100000 rows
df = pd.DataFrame({'text':    [ 'Apple is healthy',  'Potato is round', 'Apple might be green'],
                   'category': ["","", ""],
                   })

第二個數據框

#Around 3000 rows
df_2 = pd.DataFrame({'keyword':    [ 'Apple ',  'Potato'],
                   'category': ["fruit","vegetable"],
                   })

需要的結果

#Around 100000 rows
df = pd.DataFrame({'text':    [ 'Apple is healthy',  'Potato is round', 'Apple might be green'],
                   'category': ["fruit","vegetable", "fruit"],
                   })

我目前試過這個

df.set_index('text')
df_2.set_index('keyword')
df.update(df_2)

結果是

text    category
Apple is healthy    fruit
Potato is round vegetable
Apple might be green

如您所見,它沒有為最后一行添加類別。 我怎樣才能做到這一點?

您需要從DataFrame.set_index分配回輸出,因為不是像DataFrame.update這樣的就地操作,用於匹配的Series.str.extract按列df_2["keyword"]

df = df.set_index(df['text'].str.extract(f'({"|".join(df_2["keyword"])})', expand=False))
df_2 = df_2.set_index('keyword')
print (df)
                        text category
text                                 
Apple       Apple is healthy         
Potato       Potato is round         
Apple   Apple might be green  



df.update(df_2)
print (df)
                        text   category
text                                   
Apple       Apple is healthy      fruit
Potato       Potato is round  vegetable
Apple   Apple might be green      fruit

如果只需要添加一列,請使用Series.str.extractSeries.map

s = df['text'].str.extract(f'({"|".join(df_2["keyword"])})', expand=False)
df['category'] = s.map(df_2.set_index(['keyword'])['category'])
print (df)
                   text   category
0      Apple is healthy      fruit
1       Potato is round  vegetable
2  Apple might be green      fruit

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM