pandas：通过将 DataFrame 行与另一个 DataFrame 的列进行比较来创建新列

Question

Assume I have df1 :假设我有df1 ：

df1= pd.DataFrame({'alligator_apple': range(1, 11),
                  'barbadine': range(11, 21),
                  'capulin_cherry': range(21, 31)})

   alligator_apple  barbadine  capulin_cherry
0                1         11              21
1                2         12              22
2                3         13              23
3                4         14              24
4                5         15              25
5                6         16              26
6                7         17              27
7                8         18              28
8                9         19              29
9               10         20              30

And a df2 :还有一个df2 ：

df2= pd.DataFrame({'alligator_apple': [6, 7, 15, 5],
                  'barbadine': [3, 19, 25, 12],
                  'capulin_cherry': [1, 9, 15, 27]})

   alligator_apple  barbadine  capulin_cherry
0                6          3               1
1                7         19               9
2               15         25              15
3                5         12              27

I'm looking for a way to create a new column in df2 that gets number of rows based on a condition where all columns in df1 has values greater than their counterparts in df2 for each row.我正在寻找一种在df2中创建新列的方法，该列根据条件df1中的所有列的值大于df2中每一行的对应列的值来获取行数。 For example:例如：

   alligator_apple  barbadine  capulin_cherry  greater
0                6          3               1       4
1                7         19               9       1
2               15         25              15       0
3                5         12              27       3

To elaborate, at row 0 of df2 , df1.alligator_apple has 4 rows which values are higher than df2.alligator_apple with the value of 6. df1.barbadine has 10 rows which values are higher than df2.barbadine with value of 3, while similarly df1.capulin_cherry has 10 rows.详细说明，在df2第 0 行， df1.alligator_apple有 4 行，其值高于df2.alligator_apple的值为df1.barbadine有 10 行，其值高于df2.barbadine的值为 3，而类似df1.capulin_cherry有 10 行。

Finally, apply an 'and' condition to all aforementioned conditions to get the number '4' of df2.greater of first row.最后，将“and”条件应用于所有上述条件，以获得第一行df2.greater的数字“4”。 Repeat for the rest of rows in df2 .对df2的其余行重复此操作。

Is there a simple way to do this?有没有一种简单的方法可以做到这一点？

Answer 1

I believe this does what you want:我相信这可以满足您的需求：

df2['greater'] = df2.apply(
    lambda row: 
    (df1['alligator_apple'] > row['alligator_apple']) & 
    (df1['barbadine'] > row['barbadine']) & 
    (df1['capulin_cherry'] > row['capulin_cherry']), 
    axis=1,
).sum(axis=1)

print(df2)

output:输出：

   alligator_apple  barbadine  capulin_cherry  greater
0                6          3               1        4
1                7         19               9        1
2               15         25              15        0
3                5         12              27        3

Edit: if you want to generalize and apply this logic for a given column set, we can use functools.reduce together with operator.and_ :编辑：如果您想对给定的列集概括和应用此逻辑，我们可以将functools.reduce与operator.and_一起使用：

import functools
import operator

columns = ['alligator_apple', 'barbadine', 'capulin_cherry']

df2['greater'] = df2.apply(
    lambda row: functools.reduce(
        operator.and_, 
        (df1[column] > row[column] for column in columns),
    ), 
    axis=1,
).sum(axis=1)

Answer 2

There's a general solution to this that should work well.有一个通用的解决方案应该可以很好地工作。

def gt_mask(row,df):
    mask = True
    for key,val in row.items():
        mask &= df[key] > val
    return len(df[mask])

df2['greater'] = df2.apply(gt_mask,df=df1,axis=1)

Output df2输出 df2

,alligator_apple,barbadine,capulin_cherry,greater
0,6,3,1,4
1,7,19,9,1
2,15,25,15,0
3,5,12,27,3

This creates a mask, iterating through the key/val pairs for a given row.这将创建一个掩码，遍历给定行的键/值对。

Edit this answer was a big help: Masking a DataFrame on multiple column conditions - inside a loop编辑此答案有很大帮助：在多列条件上屏蔽数据帧 - 在循环内

pandas：通过将 DataFrame 行与另一个 DataFrame 的列进行比较来创建新列

问题描述

2 个解决方案

解决方案1
2 已采纳 2021-07-06 20:05:31

解决方案2
2 2021-07-06 20:11:40

pandas：通过将 DataFrame 行与另一个 DataFrame 的列进行比较来创建新列

问题描述

2 个解决方案

解决方案1 2 已采纳 2021-07-06 20:05:31

解决方案2 2 2021-07-06 20:11:40

解决方案1
2 已采纳 2021-07-06 20:05:31

解决方案2
2 2021-07-06 20:11:40