简体   繁体   English

如何基于多列从另一个 dataframe 中提取 pandas dataframe?

[英]how to extract pandas dataframe from another dataframe based on multiple column?

I have two pandas df as below:-我有两个 pandas df 如下:-

df1

Type      season    name        qty
Fruit     summer    Mango        12
Fruit     summer    watermelon   23
Fruit     summer    blueberries  200
vegetable summer    Peppers      24


df2

Availability       season          name      city
  YEs              summer          Mango     Pune
  Yes              summer          Peppers   Mumbai
  Yes              summer          Tomatoes  Mumbai    

I want to compare df2 column season and name with df1 and return matched rows with an extra column name called status contain (1 represents match,0 represents not match) in df1.我想将 df2 列的季节和名称与 df1 进行比较,并返回匹配的行,并在 df1 中返回一个名为status的额外列名包含(1 表示匹配,0 表示不匹配)。 In this case like below.在这种情况下,如下所示。

df1
Type       season    name        qty   status
Fruit      summer    Mango        12     1
Fruit      summer    watermelon   23     0
Fruit      summer    blueberries  200    0
vegetable  summer    Peppers      24     1

Here's another option using merge with how='left' :这是使用与how='left' merge的另一个选项:

df1.merge(
    df2[['season', 'name']].assign(status=1),
    how='left').fillna(0)

Output: Output:

        Type  season         name  qty  status
0      Fruit  summer        Mango   12     1.0
1      Fruit  summer   watermelon   23     0.0
2      Fruit  summer  blueberries  200     0.0
3  vegetable  summer      Peppers   24     1.0

You can use .isin in the following way:您可以通过以下方式使用.isin

df1["status"] = list(zip(df1.season, df1.name))
df1["status"] = df1["status"].isin(list(zip(df2.season, df2.name)))

Output Output

df1
        Type  season         name  qty  status
0      Fruit  summer        Mango   12    True
1      Fruit  summer   watermelon   23   False
2      Fruit  summer  blueberries  200   False
3  vegetable  summer      Peppers   24    True

Performance (vs. @perl's answer)性能(与@perl 的回答相比)

data = {'Type': {0: 'Fruit', 1: 'Fruit', 2: 'Fruit', 3: 'vegetable'},
 'season': {0: 'summer', 1: 'summer', 2: 'summer', 3: 'summer'},
 'name': {0: 'Mango', 1: 'watermelon', 2: 'blueberries', 3: 'Peppers'},
 'qty': {0: 12, 1: 23, 2: 200, 3: 24}}

#@perl's answer
%%timeit 
df1 = pd.DataFrame(data) 
df1.merge( 
     df2[['season', 'name']].assign(status=1), 
     how='left').fillna(0)
                                                                       
#5.44 ms ± 56.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

#my answer
%%timeit
df1["status"] = list(zip(df1.season, df1.name))
df1["status"].isin(list(zip(df2.season, df2.name)))

#434 µs ± 4.96 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Old (and wrong) answer旧的(和错误的)答案

You can use .isin with .to_dict :您可以将.isin.to_dict一起使用:

cols = ['season', 'name']
df1['status'] = df1[cols].isin(df2[cols].to_dict('list')).all(1).astype('int')

Output Output

df1
        Type  season         name  qty  status
0      Fruit  summer        Mango   12       1
1      Fruit  summer   watermelon   23       0
2      Fruit  summer  blueberries  200       0
3  vegetable  summer      Peppers   24       1

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas:根据另一个数据框中的值更新数据框中的多列 - Pandas : Updating multiple column in a dataframe based on values from another dataframe 如何根据另一个 DataFrame 中的列更新 Pandas DataFrame 中的列 - How to update a column in pandas DataFrame based on column from another DataFrame 根据另一列中的条件从 Pandas 数据框中提取值 - Extract Value From Pandas Dataframe Based On Condition in Another Column 如何根据另一列的时间将列添加到pandas数据框 - How to add a column to pandas dataframe based on time from another column Python:如何从熊猫数据框列中提取多个字符串 - Python: How to extract multiple strings from pandas dataframe column 如何根据基于另一个数据帧的条件提取熊猫数据帧的行 - How to extract rows of a pandas dataframe according to conditions based on another dataframe 如何从一个数据框中的列中提取特定值并将它们附加到另一个数据框中的列中? - 熊猫 - How do you extract specific values from a column in one dataframe and append them to a column in another dataframe? - Pandas 如何基于另一个DataFrame中的列在Pandas DataFrame中创建新列? - How to create a new column in a Pandas DataFrame based on a column in another DataFrame? 如何根据从另一列之间选择熊猫数据框中的行 - how to select rows in pandas dataframe based on between from another column 如何从熊猫数据框列中提取信息 - How to extract information from pandas dataframe column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM