[英]Pandas - If all values of dataFrame are NaN
How to create an if statement that does the following: 如何创建执行以下操作的if语句:
if all values in dataframe are nan:
do something
else:
do something else
According to this post , one can check if all the values of a DataFrame are NaN. 根据这篇文章 ,可以检查DataFrame的所有值是否都是NaN。 I know one cannot do:
我知道一个人做不到:
if df.isnull().all():
do something
It returns the following error: 它返回以下错误:
ValueError: The truth value of a Series is ambiguous.
ValueError:Series的真值是不明确的。 Use a.empty, a.bool(), a.item(), a.any() or a.all().
使用a.empty,a.bool(),a.item(),a.any()或a.all()。
Need another all
, because first all
return Series
and another scalar
: 需要另外
all
,因为首先all
回归Series
,另一个scalar
:
if df.isnull().all().all():
do something
Sample: 样品:
df = pd.DataFrame(index=range(5), columns=list('abcde'))
print (df)
a b c d e
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
print (df.isnull())
a b c d e
0 True True True True True
1 True True True True True
2 True True True True True
3 True True True True True
4 True True True True True
print (df.isnull().all())
a True
b True
c True
d True
e True
dtype: bool
print (df.isnull().all().all())
True
if df.isnull().all().all():
print ('do something')
If need faster solution - numpy.isnan
with numpy.all
, but first convert all values to numpy array
by values
: 如果需要更快的解决方案-
numpy.isnan
与numpy.all
,但首先将所有值numpy array
由values
:
print (np.isnan(df.values).all())
True
Timings : 时间 :
df = pd.DataFrame(np.full((1000,1000), np.nan))
print (df)
In [232]: %timeit (np.isnan(df.values).all())
1000 loops, best of 3: 1.23 ms per loop
In [233]: %timeit (df.isnull().all().all())
100 loops, best of 3: 10 ms per loop
In [234]: %timeit (df.isnull().values.all())
1000 loops, best of 3: 1.46 ms per loop
Faster improvement on jezrael's would be df.isnull().values.all()
对jezrael的快速改进将是
df.isnull().values.all()
In [156]: df.isnull().values.all()
Out[156]: True
Benchmarks 基准
small 小
In [149]: df.shape
Out[149]: (5, 5)
In [150]: %timeit df.isnull().values.all()
10000 loops, best of 3: 112 µs per loop
In [151]: %timeit df.isnull().all().all()
1000 loops, best of 3: 271 µs per loop
large 大
In [153]: df.shape
Out[153]: (1000, 1000)
In [154]: %timeit df.isnull().values.all()
10 loops, best of 3: 26.6 ms per loop
In [155]: %timeit df.isnull().all().all()
10 loops, best of 3: 40.8 ms per loop
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.