简体   繁体   English

熊猫如何在所有浮点数均为NaN时删除行

[英]pandas how to drop rows when all float columns are NaN

I have the following df 我有以下df

  AAA BBB CCC DDD  ID1  ID2  ID3  ID4
0 txt txt txt txt  10   NaN  12   NaN
1 txt txt txt txt  10   NaN  12   13
2 txt txt txt txt  NaN  NaN  NaN  NaN

With the following dtypes 使用以下dtype

AAA          object
BBB          object
CCC          object
DDD          object
ID1          float64
ID2          float64
ID3          float64
ID4          float64

Is there a way to drop rows only when ALL float columns are NaN? 仅当所有浮点列均为NaN时,才可以删除行吗?

output: 输出:

  AAA BBB CCC DDD  ID1  ID2  ID3  ID4
0 txt txt txt txt  10   NaN  12   NaN
1 txt txt txt txt  10   NaN  12   13

I can't do it with df.dropna(subset=['ID1','ID2','ID3','ID4']) because my real df has several dynamic floating columns. 我无法使用df.dropna(subset = ['ID1','ID2','ID3','ID4'])完成此操作,因为我的实际df有多个动态浮动列。

Thanks 谢谢

Use DataFrame.select_dtypes for get all float columns, then test for non missing values and select by DataFrame.any for at least one non misisng value per row - so misising floats rows are removed: 使用DataFrame.select_dtypes来获取所有浮点列,然后测试不丢失的值,并通过DataFrame.any选择每行至少一个非错误的值-这样就删除了错误的浮动行:

df1 = df[df.select_dtypes(float).notna().any(axis=1)]
print (df1)
   AAA  BBB  CCC  DDD   ID1  ID2   ID3   ID4
0  txt  txt  txt  txt  10.0  NaN  12.0   NaN
1  txt  txt  txt  txt  10.0  NaN  12.0  13.0

Your solution with DataFrame.dropna should be changed for pass float columns and parameter how='all' for test if all NaN s per rows: 您应更改DataFrame.dropna解决方案以DataFrame.dropna传递浮点数列,并更改参数how='all'以测试每行是否所有NaN

df1 = df.dropna(subset=df.select_dtypes(float).columns, how='all')
#for return same dataframe 
#df.dropna(subset=df.select_dtypes(float).columns, how='all', inplace=True)

If possible multiple types of floats check by np.floating : 如果可能,可以通过np.floating检查多种类型的浮点数:

df1 = df.dropna(subset=df.select_dtypes(np.floating).columns, how='all')

Use 采用

df.dropna(subset=df.select_dtypes(include=np.number).columns, how='all')

I'd suggest using include=np.number because it includes all float dtypes - which all may contain NaN . 我建议使用include=np.number因为它包含所有float dtypes-它们都可能包含NaN When you use include=float , you just get the standard npfloat64 dtype 当使用include=float ,您仅获得标准的npfloat64

For illustration: 例如:

df['ID5'] = np.array([1,2,np.nan], dtype=np.float16)


>>> df.select_dtypes(include=float).columns.tolist()
['ID1', 'ID2', 'ID3', 'ID4']

>>> df.select_dtypes(include=np.number).columns.tolist()
['ID1', 'ID2', 'ID3', 'ID4', 'ID5']

您可以将NaN替换为0 ,然后删除仅包含NaN

df.loc[:,~df.replace(0,np.nan).isna().all()]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM