简体   繁体   English

pandas:基于 NaN 的切片数据帧

[英]pandas: slice dataframe based on NaN

I have following dataframe df我有以下数据框df

prod_id prod_ref
10      ef3920
12      bovjhd
NaN     lkbljb
NaN     jknnkn
30      kbknkn

I am trying the following:我正在尝试以下操作:

df[df['prod_id'] != np.nan]

but I get exactly the same dataframe.但我得到完全相同的数据框。

I would like to display我想显示

prod_id prod_ref
10      ef3920
12      bovjhd
30      kbknkn

What am I doing wrong?我究竟做错了什么?

Use function notna or inverting isna :使用函数notna或反转isna

print (df[df.prod_id.notna()])
   prod_id prod_ref
0     10.0   ef3920
1     12.0   bovjhd
4     30.0   kbknkn

print (df[~df.prod_id.isna()])

   prod_id prod_ref
0     10.0   ef3920
1     12.0   bovjhd
4     30.0   kbknkn

Another solution is dropna , but need specify column for check NaN :另一种解决方案是dropna ,但需要指定检查NaN的列:

print (df.dropna(subset=['prod_id']))
   prod_id prod_ref
0     10.0   ef3920
1     12.0   bovjhd
4     30.0   kbknkn

If in another columns are not NaN values, use Alberto Garcia-Raboso's solution .如果在另一列中不是NaN值,请使用Alberto Garcia-Raboso 的解决方案

The problem is that np.nan != np.nan is True (alternatively, np.nan == np.nan is False ).问题是np.nan != np.nanTrue (或者, np.nan == np.nanFalse )。 Pandas provides the .dropna() method to do what you want: Pandas 提供了.dropna()方法来做你想做的事:

df.dropna()

Output:输出:

   prod_id prod_ref
0     10.0   ef3920
1     12.0   bovjhd
4     30.0   kbknkn

By default, .dropna() will drop any row that has a NaN in any column.默认情况下, .dropna()将删除任何列中包含NaN的任何行。 You can tweak this behavior in two ways:您可以通过两种方式调整此行为:

  • check only some columns using the subset argument, and使用subset参数仅检查某些列,并且
  • require that the row contains NaN in all columns (in the subset , if you are using it) using how='all' — the default is how='any' .使用how='all'要求该行在所有列中包含NaN (在subset ,如果您正在使用它) - 默认值为how='any'

You can check the documentation .您可以查看文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM