如何计算 Python 中列中特定值的平均值？

Question

I'm trying to drop columns that have too many missing values.我正在尝试删除缺失值过多的列。 How can I count the occurrence of some values within columns since the missing values are represented using 99 or 90由于缺失值使用 99 或 90 表示，我如何计算列中某些值的出现

here is the code that is supposed to drop columns that exceed the threshold value这是应该删除超过阈值的列的代码

threshold = 0.6

data = data[data.columns[[data.column == 90 or data.column == 99].count().mean() < threshold]]

I'm not quite used to using pandas, any suggestions would be helpful我不太习惯使用 pandas，任何建议都会有所帮助

Answer 1

You're almost there.您快到了。 Use apply :使用apply ：

threshold = 0.6
out = data[data.apply(lambda s: s.isin([90, 99])).mean(1).lt(threshold)]

Example input:示例输入：

    0   1   2   3   4
0   0  90   0   0   0
1   0   0   0   0   0
2   0  90   0  99   0
3  90   0   0   0   0
4  99  99   0  90  99  # to drop
5  99   0   0   0  99
6   0   0  99   0  90
7   0  90  99   0  90  #
8  99  90   0  90   0  #
9   0  99   0   0   0

output: output：

    0   1   2   3   4
0   0  90   0   0   0
1   0   0   0   0   0
2   0  90   0  99   0
3  90   0   0   0   0
5  99   0   0   0  99
6   0   0  99   0  90
9   0  99   0   0   0

如何计算 Python 中列中特定值的平均值？

问题描述

1 个解决方案

解决方案1
4 已采纳 2022-09-05 15:22:21

如何计算 Python 中列中特定值的平均值？

问题描述

1 个解决方案

解决方案1 4 已采纳 2022-09-05 15:22:21

解决方案1
4 已采纳 2022-09-05 15:22:21