简体   繁体   中英

Lambda function - TypeError: unhashable type: 'numpy.ndarray'

I have a numpy 2-D array with categorical data at every column.

I try to separately encode the data at each column while possibly dealing with unseen data at each case.

I have this code:

from sklearn.preprocessing import LabelEncoder

for column in range(X_train.shape[1]):

    label_encoder = LabelEncoder()

    X_train[:, column] = label_encoder.fit_transform(X_train[:, column])

    mappings = dict(zip(label_encoder.classes_, label_encoder.transform(label_encoder.classes_)))

    map_function = lambda x: mappings.get(x, -1)

    X_test[:, column] = map_function(X_test[:, column])

and I get this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-***********> in <module>
     39         mappings = dict(zip(label_encoder.classes_, label_encoder.transform(label_encoder.classes_)))
     40         map_function = lambda x: mappings.get(x, -1)
---> 41         X_test[:, column] = map_function(X_test[:, column])
     42 
     43 

<ipython-input-***********> in <lambda>(x)
     38         X_train[:, column] = label_encoder.fit_transform(X_train[:, column])
     39         mappings = dict(zip(label_encoder.classes_, label_encoder.transform(label_encoder.classes_)))
---> 40         map_function = lambda x: mappings.get(x, -1)
     41         X_test[:, column] = map_function(X_test[:, column])
     42 

TypeError: unhashable type: 'numpy.ndarray'

How can I fix this?

In general, would you suggest a better way to do what I want to do?

PS

I tried to do this to see what is happening:

for column in range(X_train.shape[1]):
    label_encoder = LabelEncoder()
    X_train[:, column] = label_encoder.fit_transform(X_train[:, column])
    mappings = dict(zip(label_encoder.classes_, label_encoder.transform(label_encoder.classes_)))

    try:
        map_function = lambda x: mappings.get(x, -1)
        X_test[:, column] = map_function(X_test[:, column])
    except:
        print(X_test[:, column])
        for i in range(X_test[:, column].shape[0]):
            if isinstance(X_test[i, column],np.ndarray):
                print(X_test[i, column])
        print()

but actually nothing was printed by print(X_test[i, column]) so I am not sure if there is any numpy array within X_test[:, column] .

I have actually also checked that if not isinstance(X_test[i, column],str) and again nothing was printed so everything in X_train[:, column] at each column must be a string.

PS2

When I do this:

 for i in range(X_test[:, column].shape[0]):
     X_test[i, column] = mappings.get(X_test[i, column], -1)

it actually works with no error so it means that for some reason in the way I have defined the lambda function I sent the whole numpy array to it than its element separately.

What happens here is that what is sent to the map_function is the actual vector, which cannot be used as a key in a dictionary because it is not hashable, hence the error.

switch the row

map_function = lambda x: mappings.get(x, -1)

with

map_function = np.vectorize(lambda x: mappings.get(x, -1))

This will cause each element to be used as the key in the mapping, and if all of them are indeed hashable it would work.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM