[英]Pandas Data Frame - Sum all the values in a previous column which match a specific condition and add it to a new column
I'm probably missing something, but I was not able to find a solution for this.我可能遗漏了一些东西,但我找不到解决方案。 Is there a way in python to add values to a new column which satisfy a certain condition.
python中有没有办法将值添加到满足特定条件的新列中。 In Excel I would apply the following formula in the new column and paste it below
在 Excel 中,我将在新列中应用以下公式并将其粘贴到下方
=SUMIF(A1:C1, ">0")
val1![]() |
val2![]() |
val3 ![]() |
output![]() |
---|---|---|---|
0.5 ![]() |
0.7 ![]() |
-0.9 ![]() |
1.2 ![]() |
0.3 ![]() |
-0.7 ![]() |
0.3 ![]() |
|
-0.5 ![]() |
-0.7 ![]() |
-0.9 ![]() |
0 ![]() |
Also in my extracts, there are a few blank values.同样在我的摘录中,还有一些空白值。 Can you please help me understand what code should be written for this?
你能帮我理解应该为此编写什么代码吗?
df['total'] = df[['A','B']].sum(axis=1).where(df['A'] > 0, 0)
I came across the above code, but it checks only one condition.我遇到了上面的代码,但它只检查一个条件。 What I need is a sum of all of those columns which match the given condition.
我需要的是与给定条件匹配的所有列的总和。
Thanks!谢谢!
pandas
can handle that quite out of the box, like that: pandas
可以开箱即pandas
处理它,就像这样:
import pandas as pd
df = pd.DataFrame([[0.5,.7,-.9],[0.3,-.7,None],[-0.5,-.7,-.9]], columns=['val1','val2','val3'])
df['output'] = df[df>0].sum(axis=1)
Use DataFrame.clip
before sum
:在
sum
之前使用DataFrame.clip
:
df['total'] = df[['val1','val2','val3']].clip(lower=0).sum(axis=1)
#solution by Nk03 from comments
cols = ['val1','val2','val3']
df['total'] = df[cols].mask(df[cols]<0).sum(axis=1)
EDIT: For test another mask by another columns convert them to numpy array:编辑:为了测试另一个列的另一个掩码,将它们转换为 numpy 数组:
df['total'] = df.loc[:, "D":"F"].mask(df.loc[:, "A":"C"].to_numpy() == 'Y', 0).sum(axis=1)
Another way, somewhat similar to SUMIF
:另一种方式,有点类似于
SUMIF
:
# this is the "IF"
is_positive = df.loc[:, "val1": "val3"] > 0
# this is selecting the parts where condition holds & sums
df["output"] = df.loc[:, "val1": "val3"][is_positive].sum(axis=1)
where axis=1
in last line is to sum along rows,最后一行中的
axis=1
是沿行求和,
to get要得到
>>> df
val1 val2 val3 output
0 0.5 0.7 -0.9 1.2
1 0.3 -0.7 NaN 0.3
2 -0.5 -0.7 -0.9 0.0
You can do it in the following way:您可以通过以下方式进行操作:
df["total"] = df.apply(lambda x: sum(x), axis=1).where((df['A'] > 0) & (df['B'] > 0) & (another_condition) & (another_condition), 0)
Note the code will take sum across all columns at once.请注意,代码将一次性计算所有列的总和。
For taking sum of specific columns you can do the following:要计算特定列的总和,您可以执行以下操作:
df['total'] = df[['A','B','C','D','E']].sum(axis=1).where((df['A'] > 0) & (df['B'] > 0) & (another_condition) & (another_condition), 0)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.