[英]Python - Pandas DF - sum values in a column that match a condition in another column
I would like to sum values in one column based on a condition in another column.我想根据另一列中的条件对一列中的值求和。 I can do this when the condition exists, but if it does not, I get an error.
当条件存在时我可以这样做,但如果不存在,我会收到错误消息。 I need this to accept that condition doesn't exist and move on to the next step.
我需要这个来接受条件不存在并继续下一步。
Example df:示例 df:
import pandas as pd
technologies = ({
'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Hadoop","Spark","Python"],
'Fee' :[22000,25000,23000,24000,26000,25000,25000,22000],
'Duration':['30days','50days','55days','40days','60days','35days','55days','50days']
})
df = pd.DataFrame(technologies, columns=['Courses','Fee','Duration'])
print(df)
Courses Fee Duration
0 Spark 22000 30days
1 PySpark 25000 50days
2 Hadoop 23000 55days
3 Python 24000 40days
4 Pandas 26000 60days
5 Hadoop 25000 35days
6 Spark 25000 55days
7 Python 22000 50days
for this example, I would like to sum the fee for all lines that have "55days"对于这个例子,我想总结所有有“55days”的线路的费用
duration = df.groupby('Duration')['Fee'].sum()["55days"]
print (df)
48000
# but if I choose a value that does not appear under Duration like "22days" I get an error #但是如果我选择了一个没有出现在 Duration 下的值,比如“22days”,我会得到一个错误
duration22 = df.groupby('Duration')['Fee'].sum()["22days"]
Can you please advise how I can code this so if the value "22days" happens not to exist on this run it does not fail or it just puts a 0 value in if null?你能告诉我如何编码,所以如果值“22days”在这次运行中碰巧不存在,它不会失败,或者它只是在 null 时输入一个 0 值?
You could do a pre-lookup check in the grouped index.您可以在分组索引中进行预查找检查。
gd_sum = df.groupby('Duration')['Fee'].sum()
def dur_sum(k):
return gd_sum[k] if k in gd_sum.index else 0
print(dur_sum('55days'))
48000
print(dur_sum('22days'))
0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.