简体   繁体   English

如何计算 Python 中列中特定值的平均值?

[英]How to calculate the mean of a specific value in columns in Python?

I'm trying to drop columns that have too many missing values.我正在尝试删除缺失值过多的列。 How can I count the occurrence of some values within columns since the missing values are represented using 99 or 90由于缺失值使用 99 或 90 表示,我如何计算列中某些值的出现

here is the code that is supposed to drop columns that exceed the threshold value这是应该删除超过阈值的列的代码

threshold = 0.6

data = data[data.columns[[data.column == 90 or data.column == 99].count().mean() < threshold]]

I'm not quite used to using pandas, any suggestions would be helpful我不太习惯使用 pandas,任何建议都会有所帮助

You're almost there.您快到了。 Use apply :使用apply

threshold = 0.6
out = data[data.apply(lambda s: s.isin([90, 99])).mean(1).lt(threshold)]

Example input:示例输入:

    0   1   2   3   4
0   0  90   0   0   0
1   0   0   0   0   0
2   0  90   0  99   0
3  90   0   0   0   0
4  99  99   0  90  99  # to drop
5  99   0   0   0  99
6   0   0  99   0  90
7   0  90  99   0  90  #
8  99  90   0  90   0  #
9   0  99   0   0   0

output: output:

    0   1   2   3   4
0   0  90   0   0   0
1   0   0   0   0   0
2   0  90   0  99   0
3  90   0   0   0   0
5  99   0   0   0  99
6   0   0  99   0  90
9   0  99   0   0   0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM