[英]Pandas dropping all the columns that contain any nan except one
I would like to drop all columns that include any nan, except one particular column.我想删除包含任何 nan 的所有列,除了一个特定的列。
df=pd.DataFrame([[1,2,np.nan,4],[1,2,4,5],[np.nan,6,np.nan,6],[4,np.nan,6,7],[1,2,3,4]], columns=['A','B','C','D'])
>>> df
A B C D
0 1.0 2.0 NaN 4
1 1.0 2.0 4.0 5
2 NaN 6.0 NaN 6
3 4.0 NaN 6.0 7
4 1.0 2.0 3.0 4
I like to drop all containing nan except df['C'].我喜欢删除除 df['C'] 之外的所有包含 nan 的内容。
>>> df
C D
0 NaN 4
1 4.0 5
2 NaN 6
3 6.0 7
4 3.0 4
One liner using subset
parameter from pandas dropna
.一个使用来自 pandas
dropna
subset
参数的衬垫。
df.dropna(subset=[n for n in df if n != 'column_to_keep'], inplace=True)
column_to_keep
is the column where you want nan to be preserved.column_to_keep
是您希望保留 nan 的列。
Instead of keeping columns, you can drop them all, and then add the one you want back on.您可以删除所有列,而不是保留列,然后重新添加您想要的列。
newdf = df.dropna(axis=1).copy() #.copy() is only here to suppress a warning.
newdf['C'] = df['C']
newdf
#produces this dataframe:
D C
0 4 NaN
1 5 4.0
2 6 NaN
3 7 6.0
4 4 3.0
I would also use isna().any()
, but also use df.columns.difference(['columns_to_ignore'])
:我也会使用
isna().any()
,但也会使用df.columns.difference(['columns_to_ignore'])
:
tmp = df[df.columns.difference(['C'])].isna().any()
df.drop(tmp.index[tmp], axis=1)
C D
0 NaN 4
1 4.0 5
2 NaN 6
3 6.0 7
4 3.0 4
Explanation :说明:
tmp
is a dataframe of booleans excluding your columns to ignore: tmp
是布尔值的数据框,不包括要忽略的列:
>>> tmp
A True
B True
D False
dtype: bool
so tmp.index[tmp]
returns a list of the columns to drop:所以
tmp.index[tmp]
返回要删除的列的列表:
>>> tmp.index[tmp]
Index(['A', 'B'], dtype='object')
You can using combine_first
您可以使用
combine_first
df.dropna(1).combine_first(df[['C']])
Out[301]:
C D
0 NaN 4
1 4.0 5
2 NaN 6
3 6.0 7
4 3.0 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.