Pandas 删除包含任何 nan 的所有列，除了一个

Question

I would like to drop all columns that include any nan, except one particular column.我想删除包含任何 nan 的所有列，除了一个特定的列。

df=pd.DataFrame([[1,2,np.nan,4],[1,2,4,5],[np.nan,6,np.nan,6],[4,np.nan,6,7],[1,2,3,4]], columns=['A','B','C','D'])

>>> df
     A    B    C  D
0  1.0  2.0  NaN  4
1  1.0  2.0  4.0  5
2  NaN  6.0  NaN  6
3  4.0  NaN  6.0  7
4  1.0  2.0  3.0  4

I like to drop all containing nan except df['C'].我喜欢删除除 df['C'] 之外的所有包含 nan 的内容。

>>> df
     C  D
0  NaN  4
1  4.0  5
2  NaN  6
3  6.0  7
4  3.0  4

Answer 1

IIUC, use isna() + any() to check which columns to drop IIUC，使用 isna isna() + any()检查要删除的列

d = df.isna().any()

Set the columns you want to ignore to False将要忽略的列设置为False

cols_to_ignore = ['C']
d[cols_to_ignore] = False

And just loc而刚刚loc

df.loc[:, ~d]

    C   D
0   NaN 4
1   4.0 5
2   NaN 6
3   6.0 7
4   3.0 4

Answer 2

One liner using subset parameter from pandas dropna .一个使用来自 pandas dropna subset参数的衬垫。

df.dropna(subset=[n for n in df if n != 'column_to_keep'], inplace=True)

column_to_keep is the column where you want nan to be preserved. column_to_keep是您希望保留 nan 的列。

Answer 3

Instead of keeping columns, you can drop them all, and then add the one you want back on.您可以删除所有列，而不是保留列，然后重新添加您想要的列。

newdf = df.dropna(axis=1).copy() #.copy() is only here to suppress a warning.
newdf['C'] = df['C']
newdf
#produces this dataframe:
   D    C
0  4  NaN
1  5  4.0
2  6  NaN
3  7  6.0
4  4  3.0

Answer 4

I would also use isna().any() , but also use df.columns.difference(['columns_to_ignore']) :我也会使用isna().any() ，但也会使用df.columns.difference(['columns_to_ignore']) ：

tmp = df[df.columns.difference(['C'])].isna().any()

df.drop(tmp.index[tmp], axis=1)

     C  D
0  NaN  4
1  4.0  5
2  NaN  6
3  6.0  7
4  3.0  4

Explanation :说明：

tmp is a dataframe of booleans excluding your columns to ignore: tmp是布尔值的数据框，不包括要忽略的列：

>>> tmp
A     True
B     True
D    False
dtype: bool

so tmp.index[tmp] returns a list of the columns to drop:所以tmp.index[tmp]返回要删除的列的列表：

>>> tmp.index[tmp]
Index(['A', 'B'], dtype='object')

Answer 5

You can using combine_first您可以使用combine_first

df.dropna(1).combine_first(df[['C']])
Out[301]: 
     C  D
0  NaN  4
1  4.0  5
2  NaN  6
3  6.0  7
4  3.0  4

Pandas 删除包含任何 nan 的所有列，除了一个

问题描述

5 个解决方案

解决方案1
6 已采纳 2018-07-23 23:45:09

解决方案2
2 2020-10-02 12:09:35

解决方案3
1 2018-07-23 23:48:23

解决方案4
1 2018-07-23 23:48:29

解决方案5
1 2018-07-24 00:58:28

Pandas 删除包含任何 nan 的所有列，除了一个

问题描述

5 个解决方案

解决方案1 6 已采纳 2018-07-23 23:45:09

解决方案2 2 2020-10-02 12:09:35

解决方案3 1 2018-07-23 23:48:23

解决方案4 1 2018-07-23 23:48:29

解决方案5 1 2018-07-24 00:58:28

解决方案1
6 已采纳 2018-07-23 23:45:09

解决方案2
2 2020-10-02 12:09:35

解决方案3
1 2018-07-23 23:48:23

解决方案4
1 2018-07-23 23:48:29

解决方案5
1 2018-07-24 00:58:28