![](/img/trans.png)
[英]Pandas : Updating multiple column in a dataframe based on values from another dataframe
[英]how to extract pandas dataframe from another dataframe based on multiple column?
我有兩個 pandas df 如下:-
df1
Type season name qty
Fruit summer Mango 12
Fruit summer watermelon 23
Fruit summer blueberries 200
vegetable summer Peppers 24
df2
Availability season name city
YEs summer Mango Pune
Yes summer Peppers Mumbai
Yes summer Tomatoes Mumbai
我想將 df2 列的季節和名稱與 df1 進行比較,並返回匹配的行,並在 df1 中返回一個名為status的額外列名包含(1 表示匹配,0 表示不匹配)。 在這種情況下,如下所示。
df1
Type season name qty status
Fruit summer Mango 12 1
Fruit summer watermelon 23 0
Fruit summer blueberries 200 0
vegetable summer Peppers 24 1
這是使用與how='left'
merge
的另一個選項:
df1.merge(
df2[['season', 'name']].assign(status=1),
how='left').fillna(0)
Output:
Type season name qty status
0 Fruit summer Mango 12 1.0
1 Fruit summer watermelon 23 0.0
2 Fruit summer blueberries 200 0.0
3 vegetable summer Peppers 24 1.0
您可以通過以下方式使用.isin
:
df1["status"] = list(zip(df1.season, df1.name))
df1["status"] = df1["status"].isin(list(zip(df2.season, df2.name)))
Output
df1
Type season name qty status
0 Fruit summer Mango 12 True
1 Fruit summer watermelon 23 False
2 Fruit summer blueberries 200 False
3 vegetable summer Peppers 24 True
性能(與@perl 的回答相比)
data = {'Type': {0: 'Fruit', 1: 'Fruit', 2: 'Fruit', 3: 'vegetable'},
'season': {0: 'summer', 1: 'summer', 2: 'summer', 3: 'summer'},
'name': {0: 'Mango', 1: 'watermelon', 2: 'blueberries', 3: 'Peppers'},
'qty': {0: 12, 1: 23, 2: 200, 3: 24}}
#@perl's answer
%%timeit
df1 = pd.DataFrame(data)
df1.merge(
df2[['season', 'name']].assign(status=1),
how='left').fillna(0)
#5.44 ms ± 56.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#my answer
%%timeit
df1["status"] = list(zip(df1.season, df1.name))
df1["status"].isin(list(zip(df2.season, df2.name)))
#434 µs ± 4.96 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
舊的(和錯誤的)答案
您可以將.isin
與.to_dict
一起使用:
cols = ['season', 'name']
df1['status'] = df1[cols].isin(df2[cols].to_dict('list')).all(1).astype('int')
Output
df1
Type season name qty status
0 Fruit summer Mango 12 1
1 Fruit summer watermelon 23 0
2 Fruit summer blueberries 200 0
3 vegetable summer Peppers 24 1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.