熊猫：检查空值：我在逐行应用此函数时做错了什么？

Question

我想检查数据框中某些（不是全部）列中的某些记录是否为空； 为此，我想创建 T/F 字段，然后我需要对其进行分组。 例如，如果我有一个字段“x”，那么我想创建一个“x POPULATED”字段，依此类推。

在我的上下文中，null 表示 NaN、字符串“不可用”或字符串“nan”。

我已经尝试了下面的代码，但它不起作用 - 我得到：

ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 0')

我的问题是：

我究竟做错了什么？
有没有更好的方法来矢量化这个？ 即使有，而且我很确定有，我仍然想了解我在代码中做错了什么。

代码：

import numpy as np, pandas as pd
df=pd.DataFrame()
df['a']=np.arange(0,10)
df['b']='test'
df['c']='nothing to test here'
df.iloc[0,:]=np.nan
df.iloc[1,1]='not available'
df.iloc[2,1]='nan'

def checknull(x):
    if pd.isnull(x) or x=='not available' or x=='nan':
        return False
    else:
        return True
    
for c in ['a','b']:
    df[c + 'populated'] =  df.apply( lambda x: checknull(df[c]) , axis=1 )

Answer 1

对于矢量化解决方案，需要isnull和isin作为掩码，然后按~反转：

df1 = ~(df[['a','b']].isnull() | (df[['a','b']].isin(['not available','nan'])))

print (df1)
       a      b
0  False  False
1   True  False
2   True  False
3   True   True
4   True   True
5   True   True
6   True   True
7   True   True
8   True   True
9   True   True

最后通过join add_suffix将新列添加到原始列：

df = df.join(df1.add_suffix('populated'))
print (df)
     a              b                     c  apopulated  bpopulated
0  NaN            NaN                   NaN       False       False
1  1.0  not available  nothing to test here        True       False
2  2.0            nan  nothing to test here        True       False
3  3.0           test  nothing to test here        True        True
4  4.0           test  nothing to test here        True        True
5  5.0           test  nothing to test here        True        True
6  6.0           test  nothing to test here        True        True
7  7.0           test  nothing to test here        True        True
8  8.0           test  nothing to test here        True        True
9  9.0           test  nothing to test here        True        True

在您的原始代码中需要x[c]而不是df[c] ，因为分别检查每一行：

for c in ['a','b']:
    df[c + 'populated'] =  df.apply( lambda x: checknull(x[c]) , axis=1 )

print (df)
     a              b                     c  apopulated  bpopulated
0  NaN            NaN                   NaN       False       False
1  1.0  not available  nothing to test here        True       False
2  2.0            nan  nothing to test here        True       False
3  3.0           test  nothing to test here        True        True
4  4.0           test  nothing to test here        True        True
5  5.0           test  nothing to test here        True        True
6  6.0           test  nothing to test here        True        True
7  7.0           test  nothing to test here        True        True
8  8.0           test  nothing to test here        True        True
9  9.0           test  nothing to test here        True        True

熊猫：检查空值：我在逐行应用此函数时做错了什么？

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-11-09 12:04:57

熊猫：检查空值：我在逐行应用此函数时做错了什么？

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-11-09 12:04:57

解决方案1
1 已采纳 2017-11-09 12:04:57