简体   繁体   English

如何在熊猫中找到多列的非零中位数/平均值?

[英]How to find non-zero median/mean of multiple columns in pandas?

I have a long list of columns for which I want to calculate non-zero median,mean & std in a one go. 我有一长列想要一次性计算非零中位数,平均值和标准差的列。 I cannot just delete rows with 0 based on 1 column because the value for another column in same column may not be 0. 我不能只删除基于1列的0行,因为同一列中另一列的值可能不是0。

Below is the code I currently have which calculates median,mean etc. including zero. 下面是我目前拥有的计算中位数,均值等(包括零)的代码。

    agg_list_oper={'ABC1':[max,np.std,np.mean,np.median],
    'ABC2':[max,np.std,np.mean,np.median],
    'ABC3':[max,np.std,np.mean,np.median],
    'ABC4':[max,np.std,np.mean,np.median],
.....
.....
.....
    }

    df=df_tmp.groupby(['id']).agg(agg_list_oper).reset_index()

I know I can write long code with loops to process one column at a time. 我知道我可以编写带有循环的长代码来一次处理一列。 Is there a way to do this in pandas groupby.agg() or some other functions elegantly? 有没有办法在pandas groupby.agg()或其他一些函数中做到这一点?

You can temporarily replace 0's with NaNs. 您可以用NaN临时替换0。 Then, pandas will ignore the NaNs while calculating medians. 然后,熊猫在计算中位数时会忽略NaN。

df_tmp.replace(0, np.nan).groupby(['id']).agg(agg_list_oper).reset_index()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM