[英]How to calculate the mean of a specific value in columns in Python?
I'm trying to drop columns that have too many missing values.我正在尝试删除缺失值过多的列。 How can I count the occurrence of some values within columns since the missing values are represented using 99 or 90由于缺失值使用 99 或 90 表示,我如何计算列中某些值的出现
here is the code that is supposed to drop columns that exceed the threshold value这是应该删除超过阈值的列的代码
threshold = 0.6
data = data[data.columns[[data.column == 90 or data.column == 99].count().mean() < threshold]]
I'm not quite used to using pandas, any suggestions would be helpful我不太习惯使用 pandas,任何建议都会有所帮助
You're almost there.您快到了。 Use apply
:使用apply
:
threshold = 0.6
out = data[data.apply(lambda s: s.isin([90, 99])).mean(1).lt(threshold)]
Example input:示例输入:
0 1 2 3 4
0 0 90 0 0 0
1 0 0 0 0 0
2 0 90 0 99 0
3 90 0 0 0 0
4 99 99 0 90 99 # to drop
5 99 0 0 0 99
6 0 0 99 0 90
7 0 90 99 0 90 #
8 99 90 0 90 0 #
9 0 99 0 0 0
output: output:
0 1 2 3 4
0 0 90 0 0 0
1 0 0 0 0 0
2 0 90 0 99 0
3 90 0 0 0 0
5 99 0 0 0 99
6 0 0 99 0 90
9 0 99 0 0 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.