簡體   English   中英

如何基於多列從另一個 dataframe 中提取 pandas dataframe?

[英]how to extract pandas dataframe from another dataframe based on multiple column?

我有兩個 pandas df 如下:-

df1

Type      season    name        qty
Fruit     summer    Mango        12
Fruit     summer    watermelon   23
Fruit     summer    blueberries  200
vegetable summer    Peppers      24


df2

Availability       season          name      city
  YEs              summer          Mango     Pune
  Yes              summer          Peppers   Mumbai
  Yes              summer          Tomatoes  Mumbai    

我想將 df2 列的季節和名稱與 df1 進行比較,並返回匹配的行,並在 df1 中返回一個名為status的額外列名包含(1 表示匹配,0 表示不匹配)。 在這種情況下,如下所示。

df1
Type       season    name        qty   status
Fruit      summer    Mango        12     1
Fruit      summer    watermelon   23     0
Fruit      summer    blueberries  200    0
vegetable  summer    Peppers      24     1

這是使用與how='left' merge的另一個選項:

df1.merge(
    df2[['season', 'name']].assign(status=1),
    how='left').fillna(0)

Output:

        Type  season         name  qty  status
0      Fruit  summer        Mango   12     1.0
1      Fruit  summer   watermelon   23     0.0
2      Fruit  summer  blueberries  200     0.0
3  vegetable  summer      Peppers   24     1.0

您可以通過以下方式使用.isin

df1["status"] = list(zip(df1.season, df1.name))
df1["status"] = df1["status"].isin(list(zip(df2.season, df2.name)))

Output

df1
        Type  season         name  qty  status
0      Fruit  summer        Mango   12    True
1      Fruit  summer   watermelon   23   False
2      Fruit  summer  blueberries  200   False
3  vegetable  summer      Peppers   24    True

性能(與@perl 的回答相比)

data = {'Type': {0: 'Fruit', 1: 'Fruit', 2: 'Fruit', 3: 'vegetable'},
 'season': {0: 'summer', 1: 'summer', 2: 'summer', 3: 'summer'},
 'name': {0: 'Mango', 1: 'watermelon', 2: 'blueberries', 3: 'Peppers'},
 'qty': {0: 12, 1: 23, 2: 200, 3: 24}}

#@perl's answer
%%timeit 
df1 = pd.DataFrame(data) 
df1.merge( 
     df2[['season', 'name']].assign(status=1), 
     how='left').fillna(0)
                                                                       
#5.44 ms ± 56.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

#my answer
%%timeit
df1["status"] = list(zip(df1.season, df1.name))
df1["status"].isin(list(zip(df2.season, df2.name)))

#434 µs ± 4.96 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

舊的(和錯誤的)答案

您可以將.isin.to_dict一起使用:

cols = ['season', 'name']
df1['status'] = df1[cols].isin(df2[cols].to_dict('list')).all(1).astype('int')

Output

df1
        Type  season         name  qty  status
0      Fruit  summer        Mango   12       1
1      Fruit  summer   watermelon   23       0
2      Fruit  summer  blueberries  200       0
3  vegetable  summer      Peppers   24       1

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM