[英]Lambda function - TypeError: unhashable type: 'numpy.ndarray'
I have a numpy 2-D array with categorical data at every column. 我有一个numpy的2-D数组,每一列都有分类数据。
I try to separately encode the data at each column while possibly dealing with unseen data at each case. 我尝试在每一列分别编码数据,同时可能在每种情况下处理看不见的数据。
I have this code: 我有以下代码:
from sklearn.preprocessing import LabelEncoder
for column in range(X_train.shape[1]):
label_encoder = LabelEncoder()
X_train[:, column] = label_encoder.fit_transform(X_train[:, column])
mappings = dict(zip(label_encoder.classes_, label_encoder.transform(label_encoder.classes_)))
map_function = lambda x: mappings.get(x, -1)
X_test[:, column] = map_function(X_test[:, column])
and I get this error: 我得到这个错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-***********> in <module>
39 mappings = dict(zip(label_encoder.classes_, label_encoder.transform(label_encoder.classes_)))
40 map_function = lambda x: mappings.get(x, -1)
---> 41 X_test[:, column] = map_function(X_test[:, column])
42
43
<ipython-input-***********> in <lambda>(x)
38 X_train[:, column] = label_encoder.fit_transform(X_train[:, column])
39 mappings = dict(zip(label_encoder.classes_, label_encoder.transform(label_encoder.classes_)))
---> 40 map_function = lambda x: mappings.get(x, -1)
41 X_test[:, column] = map_function(X_test[:, column])
42
TypeError: unhashable type: 'numpy.ndarray'
How can I fix this? 我怎样才能解决这个问题?
In general, would you suggest a better way to do what I want to do? 一般来说,您会建议一种更好的方式来做我想做的事吗?
PS PS
I tried to do this to see what is happening: 我试图这样做以查看发生了什么:
for column in range(X_train.shape[1]):
label_encoder = LabelEncoder()
X_train[:, column] = label_encoder.fit_transform(X_train[:, column])
mappings = dict(zip(label_encoder.classes_, label_encoder.transform(label_encoder.classes_)))
try:
map_function = lambda x: mappings.get(x, -1)
X_test[:, column] = map_function(X_test[:, column])
except:
print(X_test[:, column])
for i in range(X_test[:, column].shape[0]):
if isinstance(X_test[i, column],np.ndarray):
print(X_test[i, column])
print()
but actually nothing was printed by print(X_test[i, column])
so I am not sure if there is any numpy array within X_test[:, column]
. 但是实际上print(X_test[i, column])
没有打印任何内容print(X_test[i, column])
所以我不确定X_test[:, column]
是否有任何numpy数组。
I have actually also checked that if not isinstance(X_test[i, column],str)
and again nothing was printed so everything in X_train[:, column]
at each column
must be a string. 实际上,我还检查了if not isinstance(X_test[i, column],str)
并再次未打印任何内容,因此每column
X_train[:, column]
中的所有内容都必须是字符串。
PS2 PS2
When I do this: 当我这样做时:
for i in range(X_test[:, column].shape[0]):
X_test[i, column] = mappings.get(X_test[i, column], -1)
it actually works with no error so it means that for some reason in the way I have defined the lambda
function I sent the whole numpy array to it than its element separately. 它实际上没有错误,因此这意味着出于某种原因,我已经定义了lambda
函数,因此我将整个numpy数组发送给了它,而不是单独将其元素发送给了它。
What happens here is that what is sent to the map_function
is the actual vector, which cannot be used as a key in a dictionary because it is not hashable, hence the error. 在这里发生的是,发送给map_function
是实际向量,该向量不能用作字典中的键,因为它不可散列,因此会产生错误。
switch the row 切换行
map_function = lambda x: mappings.get(x, -1)
with 同
map_function = np.vectorize(lambda x: mappings.get(x, -1))
This will cause each element to be used as the key in the mapping, and if all of them are indeed hashable it would work. 这将导致每个元素都被用作映射中的键,并且如果所有元素确实都是可哈希的,它将起作用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.