[英]how to extract pandas dataframe from another dataframe based on multiple column?
I have two pandas df as below:-我有两个 pandas df 如下:-
df1
Type season name qty
Fruit summer Mango 12
Fruit summer watermelon 23
Fruit summer blueberries 200
vegetable summer Peppers 24
df2
Availability season name city
YEs summer Mango Pune
Yes summer Peppers Mumbai
Yes summer Tomatoes Mumbai
I want to compare df2 column season and name with df1 and return matched rows with an extra column name called status contain (1 represents match,0 represents not match) in df1.我想将 df2 列的季节和名称与 df1 进行比较,并返回匹配的行,并在 df1 中返回一个名为status的额外列名包含(1 表示匹配,0 表示不匹配)。 In this case like below.
在这种情况下,如下所示。
df1
Type season name qty status
Fruit summer Mango 12 1
Fruit summer watermelon 23 0
Fruit summer blueberries 200 0
vegetable summer Peppers 24 1
Here's another option using merge
with how='left'
:这是使用与
how='left'
merge
的另一个选项:
df1.merge(
df2[['season', 'name']].assign(status=1),
how='left').fillna(0)
Output: Output:
Type season name qty status
0 Fruit summer Mango 12 1.0
1 Fruit summer watermelon 23 0.0
2 Fruit summer blueberries 200 0.0
3 vegetable summer Peppers 24 1.0
You can use .isin
in the following way:您可以通过以下方式使用
.isin
:
df1["status"] = list(zip(df1.season, df1.name))
df1["status"] = df1["status"].isin(list(zip(df2.season, df2.name)))
Output Output
df1
Type season name qty status
0 Fruit summer Mango 12 True
1 Fruit summer watermelon 23 False
2 Fruit summer blueberries 200 False
3 vegetable summer Peppers 24 True
Performance (vs. @perl's answer)性能(与@perl 的回答相比)
data = {'Type': {0: 'Fruit', 1: 'Fruit', 2: 'Fruit', 3: 'vegetable'},
'season': {0: 'summer', 1: 'summer', 2: 'summer', 3: 'summer'},
'name': {0: 'Mango', 1: 'watermelon', 2: 'blueberries', 3: 'Peppers'},
'qty': {0: 12, 1: 23, 2: 200, 3: 24}}
#@perl's answer
%%timeit
df1 = pd.DataFrame(data)
df1.merge(
df2[['season', 'name']].assign(status=1),
how='left').fillna(0)
#5.44 ms ± 56.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#my answer
%%timeit
df1["status"] = list(zip(df1.season, df1.name))
df1["status"].isin(list(zip(df2.season, df2.name)))
#434 µs ± 4.96 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Old (and wrong) answer旧的(和错误的)答案
You can use .isin
with .to_dict
:您可以将
.isin
与.to_dict
一起使用:
cols = ['season', 'name']
df1['status'] = df1[cols].isin(df2[cols].to_dict('list')).all(1).astype('int')
Output Output
df1
Type season name qty status
0 Fruit summer Mango 12 1
1 Fruit summer watermelon 23 0
2 Fruit summer blueberries 200 0
3 vegetable summer Peppers 24 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.