简体   繁体   English

pandas:通过将 DataFrame 行与另一个 DataFrame 的列进行比较来创建新列

[英]pandas: Create new column by comparing DataFrame rows with columns of another DataFrame

Assume I have df1 :假设我有df1

df1= pd.DataFrame({'alligator_apple': range(1, 11),
                  'barbadine': range(11, 21),
                  'capulin_cherry': range(21, 31)})

   alligator_apple  barbadine  capulin_cherry
0                1         11              21
1                2         12              22
2                3         13              23
3                4         14              24
4                5         15              25
5                6         16              26
6                7         17              27
7                8         18              28
8                9         19              29
9               10         20              30

And a df2 :还有一个df2

df2= pd.DataFrame({'alligator_apple': [6, 7, 15, 5],
                  'barbadine': [3, 19, 25, 12],
                  'capulin_cherry': [1, 9, 15, 27]})

   alligator_apple  barbadine  capulin_cherry
0                6          3               1
1                7         19               9
2               15         25              15
3                5         12              27

I'm looking for a way to create a new column in df2 that gets number of rows based on a condition where all columns in df1 has values greater than their counterparts in df2 for each row.我正在寻找一种在df2中创建新列的方法,该列根据条件df1中的所有列的值大于df2中每一行的对应列的值来获取行数。 For example:例如:

   alligator_apple  barbadine  capulin_cherry  greater
0                6          3               1       4
1                7         19               9       1
2               15         25              15       0
3                5         12              27       3

To elaborate, at row 0 of df2 , df1.alligator_apple has 4 rows which values are higher than df2.alligator_apple with the value of 6. df1.barbadine has 10 rows which values are higher than df2.barbadine with value of 3, while similarly df1.capulin_cherry has 10 rows.详细说明,在df2第 0 行, df1.alligator_apple有 4 行,其值高于df2.alligator_apple的值为df1.barbadine有 10 行,其值高于df2.barbadine的值为 3,而类似df1.capulin_cherry有 10 行。

Finally, apply an 'and' condition to all aforementioned conditions to get the number '4' of df2.greater of first row.最后,将“and”条件应用于所有上述条件,以获得第一行df2.greater的数字“4”。 Repeat for the rest of rows in df2 .df2的其余行重复此操作。

Is there a simple way to do this?有没有一种简单的方法可以做到这一点?

I believe this does what you want:我相信这可以满足您的需求:

df2['greater'] = df2.apply(
    lambda row: 
    (df1['alligator_apple'] > row['alligator_apple']) & 
    (df1['barbadine'] > row['barbadine']) & 
    (df1['capulin_cherry'] > row['capulin_cherry']), 
    axis=1,
).sum(axis=1)

print(df2)

output:输出:

   alligator_apple  barbadine  capulin_cherry  greater
0                6          3               1        4
1                7         19               9        1
2               15         25              15        0
3                5         12              27        3

Edit: if you want to generalize and apply this logic for a given column set, we can use functools.reduce together with operator.and_ :编辑:如果您想对给定的列集概括和应用此逻辑,我们可以将functools.reduceoperator.and_一起使用:

import functools
import operator

columns = ['alligator_apple', 'barbadine', 'capulin_cherry']

df2['greater'] = df2.apply(
    lambda row: functools.reduce(
        operator.and_, 
        (df1[column] > row[column] for column in columns),
    ), 
    axis=1,
).sum(axis=1)

There's a general solution to this that should work well.有一个通用的解决方案应该可以很好地工作。

def gt_mask(row,df):
    mask = True
    for key,val in row.items():
        mask &= df[key] > val
    return len(df[mask])

df2['greater'] = df2.apply(gt_mask,df=df1,axis=1)

Output df2输出 df2

,alligator_apple,barbadine,capulin_cherry,greater
0,6,3,1,4
1,7,19,9,1
2,15,25,15,0
3,5,12,27,3

This creates a mask, iterating through the key/val pairs for a given row.这将创建一个掩码,遍历给定行的键/值对。

Edit this answer was a big help: Masking a DataFrame on multiple column conditions - inside a loop编辑此答案有很大帮助: 在多列条件上屏蔽数据帧 - 在循环内

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 pandas:通过比较DataFrame的一列的DataFrame行来创建新列 - pandas: Create new column by comparing DataFrame rows of one column of DataFrame Pandas 数据框根据另一列的条件创建新行 - Pandas dataframe create new rows based on condition from another column 根据来自另一个熊猫数据框的列在熊猫数据框中创建新行 - Create new rows in a Pandas Dataframe based on a column from another pandas dataframe 根据选定的列过滤重复的行,并与 Pandas 中的另一个 dataframe 进行比较 - Filter duplicated rows based on selected columns and comparing with another dataframe in Pandas 比较 2 个 pandas 数据框列并根据值是否相同创建新列 - Comparing 2 pandas dataframe columns and creating new column based on if the values are same or not 如何基于另一个DataFrame中的列在Pandas DataFrame中创建新列? - How to create a new column in a Pandas DataFrame based on a column in another DataFrame? Pandas 根据另一个数据框中的匹配列填充新的数据框列 - Pandas populate new dataframe column based on matching columns in another dataframe DataFrame 中的新列基于来自另一个 DataFrame 的行和列 - New column in DataFrame based on rows and columns from another DataFrame 如何从数据框中的其他列创建新的Pandas数据框列 - How to create a new Pandas dataframe column from other columns in the dataframe Pandas Dataframe 更新列基于将其他一些列与另一个具有不同列数的 dataframe 的列进行比较 - Pandas Dataframe updating a column based comparing some other columns with the columns of another dataframe with different number of columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM