简体   繁体   English

Pandas - 如果dataFrame的所有值都是NaN

[英]Pandas - If all values of dataFrame are NaN

How to create an if statement that does the following: 如何创建执行以下操作的if语句:

 if all values in dataframe are nan:
     do something 
 else: 
     do something else

According to this post , one can check if all the values of a DataFrame are NaN. 根据这篇文章 ,可以检查DataFrame的所有值是否都是NaN。 I know one cannot do: 我知道一个人做不到:

if df.isnull().all():
    do something

It returns the following error: 它返回以下错误:

ValueError: The truth value of a Series is ambiguous. ValueError:Series的真值是不明确的。 Use a.empty, a.bool(), a.item(), a.any() or a.all(). 使用a.empty,a.bool(),a.item(),a.any()或a.all()。

Need another all , because first all return Series and another scalar : 需要另外all ,因为首先all回归Series ,另一个scalar

if df.isnull().all().all():
    do something

Sample: 样品:

df = pd.DataFrame(index=range(5), columns=list('abcde'))
print (df)
     a    b    c    d    e
0  NaN  NaN  NaN  NaN  NaN
1  NaN  NaN  NaN  NaN  NaN
2  NaN  NaN  NaN  NaN  NaN
3  NaN  NaN  NaN  NaN  NaN
4  NaN  NaN  NaN  NaN  NaN

print (df.isnull())
      a     b     c     d     e
0  True  True  True  True  True
1  True  True  True  True  True
2  True  True  True  True  True
3  True  True  True  True  True
4  True  True  True  True  True

print (df.isnull().all())
a    True
b    True
c    True
d    True
e    True
dtype: bool

print (df.isnull().all().all())
True

if df.isnull().all().all():
    print ('do something')

If need faster solution - numpy.isnan with numpy.all , but first convert all values to numpy array by values : 如果需要更快的解决方案- numpy.isnannumpy.all ,但首先将所有值numpy arrayvalues

print (np.isnan(df.values).all())
True

Timings : 时间

df = pd.DataFrame(np.full((1000,1000), np.nan))
print (df)

In [232]: %timeit (np.isnan(df.values).all())
1000 loops, best of 3: 1.23 ms per loop

In [233]: %timeit (df.isnull().all().all())
100 loops, best of 3: 10 ms per loop

In [234]: %timeit (df.isnull().values.all())
1000 loops, best of 3: 1.46 ms per loop

Faster improvement on jezrael's would be df.isnull().values.all() 对jezrael的快速改进将是df.isnull().values.all()

In [156]: df.isnull().values.all()
Out[156]: True

Benchmarks 基准

small

In [149]: df.shape
Out[149]: (5, 5)

In [150]: %timeit df.isnull().values.all()
10000 loops, best of 3: 112 µs per loop

In [151]: %timeit df.isnull().all().all()
1000 loops, best of 3: 271 µs per loop

large

In [153]: df.shape
Out[153]: (1000, 1000)

In [154]: %timeit df.isnull().values.all()
10 loops, best of 3: 26.6 ms per loop

In [155]: %timeit df.isnull().all().all()
10 loops, best of 3: 40.8 ms per loop

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM