简体   繁体   中英

pandas comparision operator `==` not working as expected when column contain `List` instead `Tuple`

import pandas as pd
import numpy as np

df = pd.DataFrame({'Li':[[1,2],[5,6],[8,9]],'Tu':[(1,2),(5,6),(8,9)]}
df
       Li      Tu
0  [1, 2]  (1, 2)
1  [5, 6]  (5, 6)
2  [8, 9]  (8, 9)

Working fine for Tuple

df.Tu == (1,2)
0     True
1    False
2    False
Name: Tu, dtype: bool

When its List it gives value error

df.Li == [1,2]

ValueError: Lengths must match to compare

The problem is that list s aren't hashable, so it is necessary to compare tuple s:

print (df.Li.map(tuple) == (1,2))
0     True
1    False
2    False
Name: Li, dtype: bool

Or in list comprehension:

mask = [tuple(x) == (1,2) for x in df.Li]
#alternative
mask = [x == [1,2] for x in df.Li]
print (mask)
[True, False, False]

If all lists have the same lengths:

mask = (np.array(df.Li.tolist()) == [1,2]).all(axis=1)
print (mask)
[ True False False]

The problem is that pandas is considering [1, 2] as a series-like object and trying to compare each element of df.Li with each element of [1, 2] , hence the error:

ValueError: Lengths must match to compare

You cannot compare a list of size two with a list of size 3 ( df.Li ). In order to verify this you can do the following:

print(df.Li == [1, 2, 3])

Output

0    False
1    False
2    False
Name: Li, dtype: bool

It doesn't throw any error and works, but returns False for all as expected. In order to compare using list, you can do the following:

# this creates an array where each element is [1, 2]
data = np.empty(3, dtype=np.object)
data[:] = [[1, 2] for _ in range(3)]

print(df.Li == data)

Output

0     True
1    False
2    False
Name: Li, dtype: bool

All in all it seems like a bug in the pandas side.

My column 'vectors' contained numpy ndarrays and I got the same error when I want to compare to another ndarray 'centroid'. The following works for numpy ndarrays:

df['vectors'].apply(lambda x: ((vec==centroid).sum() == centroid.shape[0]))

Which also works for Lists:

df.Li.apply(lambda x: x==[1,2])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM