简体   繁体   中英

Finding NaN Values in Pandas MultiIndex

I'm trying to find the difference between two Pandas MultiIndex objects of different shapes. I've used:

df1.index.difference(df2)

and receive

TypeError: '<' not supported between instances of 'float' and 'str'

My indices are str and datetime, but I suspect there are NaNs hidden there (the floats). Hence my question:

What's the best way to find the NaNs somewhere in the MultiIndex? How does one iterate through the levels and names? Can I use something like isna() ?

For MultiIndex are not implemented many functions, you can check this .

You need convert MultiIndex to DataFrame by MultiIndex.to_frame first:

#W-B sample
idx=pd.MultiIndex.from_tuples([(np.nan,1),(1,1),(1,2)])

print (idx.to_frame())
         0  1
NaN 1  NaN  1
1   1  1.0  1
    2  1.0  2

print (idx.to_frame().isnull())
           0      1
NaN 1   True  False
1   1  False  False
    2  False  False

Or use DataFrame constructor:

print (pd.DataFrame(list(idx.tolist())))
     0  1
0  NaN  1
1  1.0  1
2  1.0  2

Because:

print (pd.isnull(idx))

NotImplementedError: isna is not defined for MultiIndex

EDIT:

For check at least one True per rows use any with boolean indexing :

df = idx.to_frame()
print (df[df.isna().any(axis=1)])
        0  1
NaN 1 NaN  1

Also is possible filter MultiIndex , but is necessary add MultiIndex.remove_unused_levels :

print (idx[idx.to_frame().isna().any(axis=1)].remove_unused_levels())
MultiIndex(levels=[[], [1]],
           labels=[[-1], [0]])

We can using reset_index , then with isna

idx=pd.MultiIndex.from_tuples([(np.nan,1),(1,1),(1,2)])
df=pd.DataFrame([1,2,3],index=idx)
df.reset_index().filter(like='level_').isna()
Out[304]: 
   level_0  level_1
0     True    False
1    False    False
2    False    False

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM