[英]pandas how to drop rows when all float columns are NaN
I have the following df 我有以下df
AAA BBB CCC DDD ID1 ID2 ID3 ID4
0 txt txt txt txt 10 NaN 12 NaN
1 txt txt txt txt 10 NaN 12 13
2 txt txt txt txt NaN NaN NaN NaN
With the following dtypes 使用以下dtype
AAA object
BBB object
CCC object
DDD object
ID1 float64
ID2 float64
ID3 float64
ID4 float64
Is there a way to drop rows only when ALL float columns are NaN? 仅当所有浮点列均为NaN时,才可以删除行吗?
output: 输出:
AAA BBB CCC DDD ID1 ID2 ID3 ID4
0 txt txt txt txt 10 NaN 12 NaN
1 txt txt txt txt 10 NaN 12 13
I can't do it with df.dropna(subset=['ID1','ID2','ID3','ID4']) because my real df has several dynamic floating columns. 我无法使用df.dropna(subset = ['ID1','ID2','ID3','ID4'])完成此操作,因为我的实际df有多个动态浮动列。
Thanks 谢谢
Use DataFrame.select_dtypes
for get all float columns, then test for non missing values and select by DataFrame.any
for at least one non misisng value per row - so misising floats rows are removed: 使用
DataFrame.select_dtypes
来获取所有浮点列,然后测试不丢失的值,并通过DataFrame.any
选择每行至少一个非错误的值-这样就删除了错误的浮动行:
df1 = df[df.select_dtypes(float).notna().any(axis=1)]
print (df1)
AAA BBB CCC DDD ID1 ID2 ID3 ID4
0 txt txt txt txt 10.0 NaN 12.0 NaN
1 txt txt txt txt 10.0 NaN 12.0 13.0
Your solution with DataFrame.dropna
should be changed for pass float columns and parameter how='all'
for test if all NaN
s per rows: 您应更改
DataFrame.dropna
解决方案以DataFrame.dropna
传递浮点数列,并更改参数how='all'
以测试每行是否所有NaN
:
df1 = df.dropna(subset=df.select_dtypes(float).columns, how='all')
#for return same dataframe
#df.dropna(subset=df.select_dtypes(float).columns, how='all', inplace=True)
If possible multiple types of floats check by np.floating
: 如果可能,可以通过
np.floating
检查多种类型的浮点数:
df1 = df.dropna(subset=df.select_dtypes(np.floating).columns, how='all')
Use 采用
df.dropna(subset=df.select_dtypes(include=np.number).columns, how='all')
I'd suggest using include=np.number
because it includes all float
dtypes - which all may contain NaN
. 我建议使用
include=np.number
因为它包含所有float
dtypes-它们都可能包含NaN
。 When you use include=float
, you just get the standard npfloat64
dtype 当使用
include=float
,您仅获得标准的npfloat64
For illustration: 例如:
df['ID5'] = np.array([1,2,np.nan], dtype=np.float16)
>>> df.select_dtypes(include=float).columns.tolist()
['ID1', 'ID2', 'ID3', 'ID4']
>>> df.select_dtypes(include=np.number).columns.tolist()
['ID1', 'ID2', 'ID3', 'ID4', 'ID5']
您可以将NaN
替换为0
,然后删除仅包含NaN
列
df.loc[:,~df.replace(0,np.nan).isna().all()]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.