简体   繁体   English

Pandas:如何根据其他列值的条件对列求和?

[英]Pandas: How to sum columns based on conditional of other column values?

I have the following pandas DataFrame.我有以下 pandas DataFrame。

import pandas as pd
df = pd.read_csv('filename.csv')

print(df)

     dog      A         B           C
0     dog1    0.787575  0.159330    0.053095
1     dog10   0.770698  0.169487    0.059815
2     dog11   0.792689  0.152043    0.055268
3     dog12   0.785066  0.160361    0.054573
4     dog13   0.795455  0.150464    0.054081
5     dog14   0.794873  0.150700    0.054426
..    ....
8     dog19   0.811585  0.140207    0.048208
9     dog2    0.797202  0.152033    0.050765
10    dog20   0.801607  0.145137    0.053256
11    dog21   0.792689  0.152043    0.055268
    ....

I create a new column by summing columns "A" , "B" , "C" as follows:我通过汇总列"A""B""C"创建一个新列,如下所示:

df['total_ABC'] = df[["A", "B", "B"]].sum(axis=1)

Now I would like to do this based on a conditional, ie if "A" < 0.78 then create a new summed column df['smallA_sum'] = df[["A", "B", "B"]].sum(axis=1) .现在我想根据条件执行此操作,即如果"A" < 0.78然后创建一个新的求和列df['smallA_sum'] = df[["A", "B", "B"]].sum(axis=1) Otherwise, the value should be zero.否则,该值应为零。

How does one create conditional statements like this?如何创建这样的条件语句?

My thought would be to use我的想法是使用

df['smallA_sum'] = df1.apply(lambda row: (row['A']+row['B']+row['C']) if row['A'] < 0.78))

However, this doesn't work and I'm not able to specify axis.但是,这不起作用,我无法指定轴。

How do you create a column based on the values of other columns?如何根据其他列的值创建列?

You could also do something like for each df['dog'] == 'dog2' , create column dog2_sum , ie您也可以为每个df['dog'] == 'dog2'创建列dog2_sum ,即

 df['dog2_sum'] = df1.apply(lambda row: (row['A']+row['B']+row['C']) if df['dog'] == 'dog2'))

but my approach is incorrect.但我的方法是不正确的。

The following should work, here we mask the df where the condition is met, this will set NaN to the rows where the condition isn't met so we call fillna on the new col:以下应该有效,在这里我们屏蔽满足条件的 df,这会将NaN设置为不满足条件的行,因此我们在新列上调用fillna

In [67]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))
df

Out[67]:
          A         B         C
0  0.197334  0.707852 -0.443475
1 -1.063765 -0.914877  1.585882
2  0.899477  1.064308  1.426789
3 -0.556486 -0.150080 -0.149494
4 -0.035858  0.777523 -0.453747

In [73]:    
df['total'] = df.loc[df['A'] > 0,['A','B']].sum(axis=1)
df['total'].fillna(0, inplace=True)
df

Out[73]:
          A         B         C     total
0  0.197334  0.707852 -0.443475  0.905186
1 -1.063765 -0.914877  1.585882  0.000000
2  0.899477  1.064308  1.426789  1.963785
3 -0.556486 -0.150080 -0.149494  0.000000
4 -0.035858  0.777523 -0.453747  0.000000

Another approach is to call where on the sum result, this takes a value param to return when the condition isn't met:另一种方法是在sum结果上调用where ,当条件不满足时,这需要一个值参数来返回:

In [75]:
df['total'] = df[['A','B']].sum(axis=1).where(df['A'] > 0, 0)
df

Out[75]:
          A         B         C     total
0  0.197334  0.707852 -0.443475  0.905186
1 -1.063765 -0.914877  1.585882  0.000000
2  0.899477  1.064308  1.426789  1.963785
3 -0.556486 -0.150080 -0.149494  0.000000
4 -0.035858  0.777523 -0.453747  0.000000

Another approach is to use numpy.where() method to select values.另一种方法是对 select 值使用numpy.where()方法。 It returns elements chosen from the sum result if the condition is met, 0 otherwise.如果满足条件,则返回从求和结果中选择的元素,否则返回 0。 Due to a lower overhead, numpy methods are usually faster than their pandas cousins.由于开销较低,numpy 方法通常比它们的 pandas 表亲更快。 Barring numba-jitted or Cython loops, this is the fastest approach for this specific task.除非使用 numba-jitted 或 Cython 循环,否则这是完成此特定任务的最快方法。

import numpy as np
df['Total'] = np.where(df['A'] < 0.78, df[['A','B','C']].sum(axis=1), 0)

or或者

df['total'] = np.where(df['dog'] == 'dog2', df[['A','B','C']].sum(axis=1), 0)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas:如何根据其他列值的条件创建对其他列求和的列? - Pandas: How create columns where sum other columns based on conditional of other column values? 如何根据 pandas 中的其他列对一列的值求和? - How to sum values of one column based on other columns in pandas? 如何使用Pandas数据框中其他列的条件语句求和一列中的值? - How to sum values in a column using conditional statements of other columns in a pandas dataframe? 如何在Pandas Data Frame中创建条件列,其中列值基于其他列 - How to create conditional columns in Pandas Data Frame in which column values are based on other columns Pandas 基于另一列的条件行值 - Pandas conditional row values based on an other column 基于其他列中的值的 Pandas 条件计算 - Pandas conditional calculation based on values in other column 熊猫如何根据其他列中的值汇总一列的总和 - pandas how to aggregate sum on a column depending on values in other columns 如何将基于其他列值的列附加到pandas数据框 - How to append columns based on other column values to pandas dataframe 如何基于熊猫中其他列条件对列的某些值求平均值 - How to average certain values of a column based on other columns condition in pandas 如何根据pandas中其他列的值计算新列 - python - how to compute a new column based on the values of other columns in pandas - python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM