简体   繁体   English

如何将 one-hot 编码转换为整数?

[英]How to convert one-hot encodings into integers?

I have a numpy array data set with shape (100,10).我有一个形状为 (100,10) 的 numpy 数组数据集。 Each row is a one-hot encoding.每一行都是一个单热编码。 I want to transfer it into a nd-array with shape (100,) such that I transferred each vector row into a integer that denote the index of the nonzero index.我想将其转换为形状为 (100,) 的 nd 数组,以便将每个向量行转换为表示非零索引的索引的整数。 Is there a quick way of doing this using numpy or tensorflow?有没有使用 numpy 或 tensorflow 的快速方法?

You can use numpy.argmax or tf.argmax .您可以使用numpy.argmaxtf.argmax Example:例子:

import numpy as np  
a  = np.array([[0,1,0,0],[1,0,0,0],[0,0,0,1]])
print('np.argmax(a, axis=1): {0}'.format(np.argmax(a, axis=1)))

output:输出:

np.argmax(a, axis=1): [1 0 3]

You may also want to look at sklearn.preprocessing.LabelBinarizer.inverse_transform .您可能还想查看sklearn.preprocessing.LabelBinarizer.inverse_transform

As pointed out by Franck Dernoncourt, since a one hot encoding only has a single 1 and the rest are zeros, you can use argmax for this particular example.正如 Franck Dernoncourt 所指出的,由于 one hot 编码只有一个 1,其余的都是 0,因此您可以在这个特定示例中使用 argmax。 In general, if you want to find a value in a numpy array, you'll probabaly want to consult numpy.where .一般来说,如果你想在一个 numpy 数组中找到一个值,你可能会想咨询numpy.where Also, this stack exchange question:另外,这个堆栈交换问题:

Is there a NumPy function to return the first index of something in an array? 是否有一个 NumPy 函数来返回数组中某物的第一个索引?

Since a one-hot vector is a vector with all 0s and a single 1, you can do something like this:由于 one-hot 向量是一个全 0 和一个 1 的向量,因此您可以执行以下操作:

>>> import numpy as np
>>> a = np.array([[0,1,0,0],[1,0,0,0],[0,0,0,1]])
>>> [np.where(r==1)[0][0] for r in a]
[1, 0, 3]

This just builds a list of the index which is 1 for each row.这只是建立一个索引列表,每行为 1。 The [0][0] indexing is just to ditch the structure (a tuple with an array) returned by np.where which is more than you asked for. [0][0] 索引只是为了np.where返回的结构(带有数组的元组),这比您要求的要多。

For any particular row, you just want to index into a.对于任何特定的行,您只想索引到 a。 For example in the zeroth row the 1 is found in index 1.例如,在第 0 行中,1 在索引 1 中找到。

>>> np.where(a[0]==1)[0][0]
1

Simply use np.argmax(x, axis=1)只需使用np.argmax(x, axis=1)

Example:例子:

import numpy as np
array = np.array([[0, 1, 0, 0], [0, 0, 0, 1]])
print(np.argmax(array, axis=1))
> [1 3]

While I strongly suggest to use numpy for speed, mpu.ml.one_hot2indices(one_hots) shows how to do it without numpy.虽然我强烈建议使用 numpy 来提高速度, mpu.ml.one_hot2indices(one_hots)展示了如何在没有 numpy 的情况下做到这一点。 Simply pip install mpu --user --upgrade .只需pip install mpu --user --upgrade

Then you can do然后你可以做

>>> one_hot2indices([[1, 0], [1, 0], [0, 1]])
[0, 0, 1]

What I do in these cases is something like this.我在这些情况下所做的就是这样。 The idea is to interpret the one-hot vector as an index of a 1,2,3,4,5... array.这个想法是将 one-hot 向量解释为 1,2,3,4,5... 数组的索引。

# Define stuff
import numpy as np
one_hots = np.zeros([100,10])
for k in range(100):
    one_hots[k,:] = np.random.permutation([1,0,0,0,0,0,0,0,0,0])

# Finally, the trick
ramp = np.tile(np.arange(0,10),[100,1])
integers = ramp[one_hots==1].ravel()

I prefer this trick because I feel np.argmax and other suggested solutions may be slower than indexing (although indexing may consume more memory)我更喜欢这个技巧,因为我觉得np.argmax和其他建议的解决方案可能比索引慢(尽管索引可能会消耗更多内存)

def int_to_onehot(n, n_classes):
    v = [0] * n_classes
    v[n] = 1
    return v

def onehot_to_int(v):
    return v.index(1)


>>> v = int_to_onehot(2, 5)
>>> v
[0, 0, 1, 0, 0]


>>> i = onehot_to_int(v)
>>> i
2

You can use this simple code:您可以使用这个简单的代码:

a=[[0,0,0,0,0,1,0,0,0,0]]
j=0
for i in a[0]:
    if i==1:
        print(j)
    else:
        j+=1

5 5

def one_hot_decode(encoded_seq):
    return [argmax(vector) for vector in encoded_seq]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM