簡體   English   中英

如何在%NANs高於某個數字的情況下刪除浮點功能?

[英]How to drop float feature where % NANs is higher than a certain number?

我正在嘗試刪除一項功能,如果該功能浮動且缺少的值數量大於某個數量。

我試過了:

# Define threshold to 1/6
threshold = 0.1667

# Drop float > threshold 
for f in data: 
if data[f].dtype==float & data[f].isnull().sum() / data.shape[0] > threshold: del data[f]

..這會引發錯誤:

TypeError:&不支持的操作數類型:“類型”和“ numpy.float64”

幫助將不勝感激。

使用DataFrame.select_dtypes僅用於浮點列,檢查缺失值並獲取mean - sum/count然后通過Series.reindex添加另一個非浮點列,通過inverse條件最后一個過濾器>到通過boolean indexing <=

np.random.seed(2019)
df = pd.DataFrame(np.random.choice([np.nan,1], p=(0.2,0.8),size=(10,10))).assign(A='a')
print (df)
     0    1    2    3    4    5    6    7    8    9  A
0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  a
1  1.0  1.0  NaN  1.0  NaN  1.0  NaN  1.0  1.0  1.0  a
2  1.0  1.0  1.0  1.0  1.0  NaN  1.0  NaN  1.0  1.0  a
3  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  NaN  1.0  a
4  1.0  NaN  1.0  1.0  1.0  1.0  1.0  NaN  1.0  1.0  a
5  1.0  1.0  1.0  1.0  1.0  1.0  NaN  1.0  1.0  1.0  a
6  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  a
7  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  a
8  1.0  NaN  1.0  1.0  1.0  1.0  NaN  1.0  1.0  1.0  a
9  NaN  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  NaN  a

threshold = 0.1667
df1 = df.select_dtypes(float).isnull().mean().reindex(df.columns, fill_value=False)
df = df.loc[:, df1 <= threshold]
print (df)
     0    2    3    4    5    8    9  A
0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  a
1  1.0  NaN  1.0  NaN  1.0  1.0  1.0  a
2  1.0  1.0  1.0  1.0  NaN  1.0  1.0  a
3  1.0  1.0  1.0  1.0  1.0  NaN  1.0  a
4  1.0  1.0  1.0  1.0  1.0  1.0  1.0  a
5  1.0  1.0  1.0  1.0  1.0  1.0  1.0  a
6  1.0  1.0  1.0  1.0  1.0  1.0  1.0  a
7  1.0  1.0  1.0  1.0  1.0  1.0  1.0  a
8  1.0  1.0  1.0  1.0  1.0  1.0  1.0  a
9  NaN  1.0  1.0  1.0  1.0  1.0  NaN  a

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM