Python/Pandas - 从数据框中删除所有列，其中 > 50% 的行的值为 0

Question

我正在使用 Python 和 Pandas。 我想从我的数据框中删除每一列，其中超过 50% 的行在该特定列中的值为 0。

下面是一个例子：

import pandas as pd

# defining a dataframe
data = [['Alex',10, 173, 0, 4000],['Bob',12, 0, 0, 4000], ['Clarke',13, 0, 0, 0]]
# naming the columns
df = pd.DataFrame(data,columns=['Name','Age', 'Height', 'Score', 'Income'])

# printing the dataframe
print(df)

我设法制作了一个表格，向我显示每列和百分比的值为 0 的行数。 但我有一种感觉，我走错了路。 有人可以帮忙吗？

# make a new dataframe and count the number of values = zero per column
zeroValues = df.eq(0).sum(axis=0)
zeroValues = zeroValues.to_frame()

# name the column
zeroValues.columns = ["# of zero values"]

# add a column that calculates the  percentage of values = zero
zeroValues["zeroValues %"] = ((zeroValues["# of zero values"] * 100) / 
len(df.index))

# print the result
print(zeroValues)

Answer 1

首先使用DataFrame.mean获取0值的百分比，然后使用loc过滤 - 需要所有值小于或等于0.5 ：

zeroValues = df.eq(0).mean()
print (zeroValues)
Name      0.000000
Age       0.000000
Height    0.666667
Score     1.000000
Income    0.333333
dtype: float64

print (zeroValues <= 0.5)
Name       True
Age        True
Height    False
Score     False
Income     True
dtype: bool

df = df.loc[:, zeroValues <= 0.5]
print (df)
     Name  Age  Income
0    Alex   10    4000
1     Bob   12    4000
2  Clarke   13       0

一排解决方案：

df = df.loc[:, df.eq(0).mean().le(.5)]
print (df)
     Name  Age  Income
0    Alex   10    4000
1     Bob   12    4000
2  Clarke   13       0

Python/Pandas - 从数据框中删除所有列，其中 > 50% 的行的值为 0

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-10-01 12:34:04

Python/Pandas - 从数据框中删除所有列，其中 &gt; 50% 的行的值为 0

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-10-01 12:34:04

Python/Pandas - 从数据框中删除所有列，其中 > 50% 的行的值为 0

解决方案1
2 已采纳 2017-10-01 12:34:04