pandas：通过将 DataFrame 行与另一个 DataFrame 的列进行比较来创建新列

Question

假设我有df1 ：

df1= pd.DataFrame({'alligator_apple': range(1, 11),
                  'barbadine': range(11, 21),
                  'capulin_cherry': range(21, 31)})

   alligator_apple  barbadine  capulin_cherry
0                1         11              21
1                2         12              22
2                3         13              23
3                4         14              24
4                5         15              25
5                6         16              26
6                7         17              27
7                8         18              28
8                9         19              29
9               10         20              30

还有一个df2 ：

df2= pd.DataFrame({'alligator_apple': [6, 7, 15, 5],
                  'barbadine': [3, 19, 25, 12],
                  'capulin_cherry': [1, 9, 15, 27]})

   alligator_apple  barbadine  capulin_cherry
0                6          3               1
1                7         19               9
2               15         25              15
3                5         12              27

我正在寻找一种在df2中创建新列的方法，该列根据条件df1中的所有列的值大于df2中每一行的对应列的值来获取行数。 例如：

   alligator_apple  barbadine  capulin_cherry  greater
0                6          3               1       4
1                7         19               9       1
2               15         25              15       0
3                5         12              27       3

详细说明，在df2第 0 行， df1.alligator_apple有 4 行，其值高于df2.alligator_apple的值为df1.barbadine有 10 行，其值高于df2.barbadine的值为 3，而类似df1.capulin_cherry有 10 行。

最后，将“and”条件应用于所有上述条件，以获得第一行df2.greater的数字“4”。 对df2的其余行重复此操作。

有没有一种简单的方法可以做到这一点？

Answer 1

我相信这可以满足您的需求：

df2['greater'] = df2.apply(
    lambda row: 
    (df1['alligator_apple'] > row['alligator_apple']) & 
    (df1['barbadine'] > row['barbadine']) & 
    (df1['capulin_cherry'] > row['capulin_cherry']), 
    axis=1,
).sum(axis=1)

print(df2)

输出：

   alligator_apple  barbadine  capulin_cherry  greater
0                6          3               1        4
1                7         19               9        1
2               15         25              15        0
3                5         12              27        3

编辑：如果您想对给定的列集概括和应用此逻辑，我们可以将functools.reduce与operator.and_一起使用：

import functools
import operator

columns = ['alligator_apple', 'barbadine', 'capulin_cherry']

df2['greater'] = df2.apply(
    lambda row: functools.reduce(
        operator.and_, 
        (df1[column] > row[column] for column in columns),
    ), 
    axis=1,
).sum(axis=1)

Answer 2

有一个通用的解决方案应该可以很好地工作。

def gt_mask(row,df):
    mask = True
    for key,val in row.items():
        mask &= df[key] > val
    return len(df[mask])

df2['greater'] = df2.apply(gt_mask,df=df1,axis=1)

输出 df2

,alligator_apple,barbadine,capulin_cherry,greater
0,6,3,1,4
1,7,19,9,1
2,15,25,15,0
3,5,12,27,3

这将创建一个掩码，遍历给定行的键/值对。

编辑此答案有很大帮助：在多列条件上屏蔽数据帧 - 在循环内

pandas：通过将 DataFrame 行与另一个 DataFrame 的列进行比较来创建新列

问题描述

2 个解决方案

解决方案1
2 已采纳 2021-07-06 20:05:31

解决方案2
2 2021-07-06 20:11:40

pandas：通过将 DataFrame 行与另一个 DataFrame 的列进行比较来创建新列

问题描述

2 个解决方案

解决方案1 2 已采纳 2021-07-06 20:05:31

解决方案2 2 2021-07-06 20:11:40

解决方案1
2 已采纳 2021-07-06 20:05:31

解决方案2
2 2021-07-06 20:11:40