"将索引数组转换为 1-hot 编码的 numpy 数组"

Question

Let's say I have a 1d numpy array假设我有一个 1d numpy 数组

a = array([1,0,3])

Answer 1

Your array a defines the columns of the nonzero elements in the output array.您的数组a定义了输出数组中非零元素的列。 You need to also define the rows and then use fancy indexing:您还需要定义行，然后使用花式索引：

>>> a = np.array([1, 0, 3])
>>> b = np.zeros((a.size, a.max()+1))
>>> b[np.arange(a.size),a] = 1
>>> b
array([[ 0.,  1.,  0.,  0.],
       [ 1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.]])

Answer 2

>>> values = [1, 0, 3]
>>> n_values = np.max(values) + 1
>>> np.eye(n_values)[values]
array([[ 0.,  1.,  0.,  0.],
       [ 1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.]])

Answer 3

In case you are using keras, there is a built in utility for that:如果您使用 keras，有一个内置的实用程序：

from keras.utils.np_utils import to_categorical   

categorical_labels = to_categorical(int_labels, num_classes=3)

And it does pretty much the same as @YXD's answer (see source-code ).它与@YXD 的答案几乎相同（请参阅源代码）。

Answer 4

Here is what I find useful:以下是我认为有用的内容：

def one_hot(a, num_classes):
  return np.squeeze(np.eye(num_classes)[a.reshape(-1)])

Here num_classes stands for number of classes you have.这里num_classes代表您拥有的类数。 So if you have a vector with shape of (10000,) this function transforms it to (10000,C) .因此，如果您有a形状为(10000,) a向量，则此函数会将其转换为(10000,C) 。 Note that a is zero-indexed, ie one_hot(np.array([0, 1]), 2) will give [[1, 0], [0, 1]] .请注意， a是零索引的，即one_hot(np.array([0, 1]), 2)将给出[[1, 0], [0, 1]] 。

Exactly what you wanted to have I believe.正是你想要的，我相信。

PS: the source is Sequence models - deeplearning.ai PS：来源是序列模型-deeplearning.ai

Answer 5

您还可以使用 numpy 的眼睛功能：

numpy.eye(number of classes)[vector containing the labels]

Answer 6

You can use sklearn.preprocessing.LabelBinarizer :您可以使用sklearn.preprocessing.LabelBinarizer ：

Example:例子：

import sklearn.preprocessing
a = [1,0,3]
label_binarizer = sklearn.preprocessing.LabelBinarizer()
label_binarizer.fit(range(max(a)+1))
b = label_binarizer.transform(a)
print('{0}'.format(b))

output:输出：

[[0 1 0 0]
 [1 0 0 0]
 [0 0 0 1]]

Amongst other things, you may initialize sklearn.preprocessing.LabelBinarizer() so that the output of transform is sparse.除其他外，您可以初始化sklearn.preprocessing.LabelBinarizer()以便transform的输出是稀疏的。

Answer 7

You can use the following code for converting into a one-hot vector:您可以使用以下代码转换为 one-hot 向量：

let x is the normal class vector having a single column with classes 0 to some number:让 x 是具有单个列的普通类向量，其中类为 0 到某个数字：

import numpy as np
np.eye(x.max()+1)[x]

if 0 is not a class;如果 0 不是一个类； then remove +1.然后删除+1。

Answer 8

For 1-hot-encoding对于 1-hot-encoding

   one_hot_encode=pandas.get_dummies(array)

For Example例如

ENJOY CODING享受编码

Answer 9

Here is a function that converts a 1-D vector to a 2-D one-hot array.这是一个将一维向量转换为二维单热数组的函数。

#!/usr/bin/env python
import numpy as np

def convertToOneHot(vector, num_classes=None):
    """
    Converts an input 1-D vector of integers into an output
    2-D array of one-hot vectors, where an i'th input value
    of j will set a '1' in the i'th row, j'th column of the
    output array.

    Example:
        v = np.array((1, 0, 4))
        one_hot_v = convertToOneHot(v)
        print one_hot_v

        [[0 1 0 0 0]
         [1 0 0 0 0]
         [0 0 0 0 1]]
    """

    assert isinstance(vector, np.ndarray)
    assert len(vector) > 0

    if num_classes is None:
        num_classes = np.max(vector)+1
    else:
        assert num_classes > 0
        assert num_classes >= np.max(vector)

    result = np.zeros(shape=(len(vector), num_classes))
    result[np.arange(len(vector)), vector] = 1
    return result.astype(int)

Below is some example usage:下面是一些示例用法：

>>> a = np.array([1, 0, 3])

>>> convertToOneHot(a)
array([[0, 1, 0, 0],
       [1, 0, 0, 0],
       [0, 0, 0, 1]])

>>> convertToOneHot(a, num_classes=10)
array([[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]])

Answer 10

I think the short answer is no.我认为简短的回答是否定的。 For a more generic case in n dimensions, I came up with this:对于n维更通用的情况，我想出了这个：

# For 2-dimensional data, 4 values
a = np.array([[0, 1, 2], [3, 2, 1]])
z = np.zeros(list(a.shape) + [4])
z[list(np.indices(z.shape[:-1])) + [a]] = 1

I am wondering if there is a better solution -- I don't like that I have to create those lists in the last two lines.我想知道是否有更好的解决方案——我不喜欢我必须在最后两行中创建这些列表。 Anyway, I did some measurements with timeit and it seems that the numpy -based ( indices / arange ) and the iterative versions perform about the same.无论如何，我用timeit做了一些测量，似乎基于numpy的（ indices / arange ）和迭代版本的表现大致相同。

Answer 11

Just to elaborate on the excellent answer from K3---rnc , here is a more generic version:只是为了详细说明K3---rnc的优秀答案，这里有一个更通用的版本：

def onehottify(x, n=None, dtype=float):
    """1-hot encode x with the max value n (computed from data if n is None)."""
    x = np.asarray(x)
    n = np.max(x) + 1 if n is None else n
    return np.eye(n, dtype=dtype)[x]

Also, here is a quick-and-dirty benchmark of this method and a method from the currently accepted answer by YXD (slightly changed, so that they offer the same API except that the latter works only with 1D ndarrays):此外，这里是此方法的快速基准测试和YXD 当前接受的答案中的一种方法（略有更改，因此它们提供相同的 API，只是后者仅适用于 1D ndarrays）：

def onehottify_only_1d(x, n=None, dtype=float):
    x = np.asarray(x)
    n = np.max(x) + 1 if n is None else n
    b = np.zeros((len(x), n), dtype=dtype)
    b[np.arange(len(x)), x] = 1
    return b

The latter method is ~35% faster (MacBook Pro 13 2015), but the former is more general:后一种方法快约 35%（MacBook Pro 13 2015），但前者更通用：

>>> import numpy as np
>>> np.random.seed(42)
>>> a = np.random.randint(0, 9, size=(10_000,))
>>> a
array([6, 3, 7, ..., 5, 8, 6])
>>> %timeit onehottify(a, 10)
188 µs ± 5.03 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit onehottify_only_1d(a, 10)
139 µs ± 2.78 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Answer 12

I recently ran into a problem of same kind and found said solution which turned out to be only satisfying if you have numbers that go within a certain formation.我最近遇到了同样的问题，并找到了上述解决方案，结果证明只有当你的数字符合某种形式时才会令人满意。 For example if you want to one-hot encode following list:例如，如果您想对以下列表进行单热编码：

all_good_list = [0,1,2,3,4]

go ahead, the posted solutions are already mentioned above.继续，上面已经提到了已发布的解决方案。 But what if considering this data:但是如果考虑这些数据呢：

problematic_list = [0,23,12,89,10]

If you do it with methods mentioned above, you will likely end up with 90 one-hot columns.如果您使用上述方法进行操作，您可能会得到 90 个单热列。 This is because all answers include something like n = np.max(a)+1 .这是因为所有答案都包含类似n = np.max(a)+1 。 I found a more generic solution that worked out for me and wanted to share with you:我找到了一个更通用的解决方案，它对我有用，并想与您分享：

import numpy as np
import sklearn
sklb = sklearn.preprocessing.LabelBinarizer()
a = np.asarray([1,2,44,3,2])
n = np.unique(a)
sklb.fit(n)
b = sklb.transform(a)

I hope someone encountered same restrictions on above solutions and this might come in handy我希望有人在上述解决方案上遇到相同的限制，这可能会派上用场

Answer 13

Such type of encoding are usually part of numpy array.这种类型的编码通常是 numpy 数组的一部分。 If you are using a numpy array like this :如果您使用的是这样的 numpy 数组：

a = np.array([1,0,3])

then there is very simple way to convert that to 1-hot encoding那么有一种非常简单的方法可以将其转换为 1-hot 编码

out = (np.arange(4) == a[:,None]).astype(np.float32)

That's it.就是这样。

Answer 14

p will be a 2d ndarray. p 将是一个二维 ndarray。
We want to know which value is the highest in a row, to put there 1 and everywhere else 0.我们想知道哪个值在一行中最高，在那里放 1，其他地方放 0。

clean and easy solution:干净简单的解决方案：

max_elements_i = np.expand_dims(np.argmax(p, axis=1), axis=1)
one_hot = np.zeros(p.shape)
np.put_along_axis(one_hot, max_elements_i, 1, axis=1)

Answer 15

Here is an example function that I wrote to do this based upon the answers above and my own use case:这是我根据上述答案和我自己的用例编写的示例函数：

def label_vector_to_one_hot_vector(vector, one_hot_size=10):
    """
    Use to convert a column vector to a 'one-hot' matrix

    Example:
        vector: [[2], [0], [1]]
        one_hot_size: 3
        returns:
            [[ 0.,  0.,  1.],
             [ 1.,  0.,  0.],
             [ 0.,  1.,  0.]]

    Parameters:
        vector (np.array): of size (n, 1) to be converted
        one_hot_size (int) optional: size of 'one-hot' row vector

    Returns:
        np.array size (vector.size, one_hot_size): converted to a 'one-hot' matrix
    """
    squeezed_vector = np.squeeze(vector, axis=-1)

    one_hot = np.zeros((squeezed_vector.size, one_hot_size))

    one_hot[np.arange(squeezed_vector.size), squeezed_vector] = 1

    return one_hot

label_vector_to_one_hot_vector(vector=[[2], [0], [1]], one_hot_size=3)

Answer 16

I am adding for completion a simple function, using only numpy operators:我正在添加一个简单的函数来完成，仅使用 numpy 运算符：

   def probs_to_onehot(output_probabilities):
        argmax_indices_array = np.argmax(output_probabilities, axis=1)
        onehot_output_array = np.eye(np.unique(argmax_indices_array).shape[0])[argmax_indices_array.reshape(-1)]
        return onehot_output_array

It takes as input a probability matrix: eg:它需要一个概率矩阵作为输入：例如：

[[0.03038822 0.65810204 0.16549407 0.3797123 ] ... [0.02771272 0.2760752 0.3280924 0.33458805]] [[0.03038822 0.65810204 0.16549407 0.3797123] ... [0.02771272 0.2760752 0.3280924 0.33458805]]

And it will return它会回来

[[0 1 0 0] ... [0 0 0 1]] [[0 1 0 0] ... [0 0 0 1]]

Answer 17

Here's a dimensionality-independent standalone solution.这是一个与维度无关的独立解决方案。

This will convert any N-dimensional array arr of nonnegative integers to a one-hot N+1-dimensional array one_hot , where one_hot[i_1,...,i_N,c] = 1 means arr[i_1,...,i_N] = c .这会将任何非负整数的 N 维数组arr转换为单热 N+1 维数组one_hot ，其中one_hot[i_1,...,i_N,c] = 1表示arr[i_1,...,i_N] = c 。 You can recover the input via np.argmax(one_hot, -1)您可以通过np.argmax(one_hot, -1)恢复输入

def expand_integer_grid(arr, n_classes):
    """

    :param arr: N dim array of size i_1, ..., i_N
    :param n_classes: C
    :returns: one-hot N+1 dim array of size i_1, ..., i_N, C
    :rtype: ndarray

    """
    one_hot = np.zeros(arr.shape + (n_classes,))
    axes_ranges = [range(arr.shape[i]) for i in range(arr.ndim)]
    flat_grids = [_.ravel() for _ in np.meshgrid(*axes_ranges, indexing='ij')]
    one_hot[flat_grids + [arr.ravel()]] = 1
    assert((one_hot.sum(-1) == 1).all())
    assert(np.allclose(np.argmax(one_hot, -1), arr))
    return one_hot

Answer 18

Use the following code.使用以下代码。 It works best.它效果最好。

def one_hot_encode(x):
"""
    argument
        - x: a list of labels
    return
        - one hot encoding matrix (number of labels, number of class)
"""
encoded = np.zeros((len(x), 10))

for idx, val in enumerate(x):
    encoded[idx][val] = 1

return encoded

Found it here PS You don't need to go into the link. 在这里找到它PS 你不需要进入链接。

Answer 19

Using a Neuraxle pipeline step:使用Neuraxle流水线步骤：

Set up your example设置您的示例

import numpy as np
a = np.array([1,0,3])
b = np.array([[0,1,0,0], [1,0,0,0], [0,0,0,1]])

Do the actual conversion进行实际转换

from neuraxle.steps.numpy import OneHotEncoder
encoder = OneHotEncoder(nb_columns=4)
b_pred = encoder.transform(a)

Assert it works断言它有效

assert b_pred == b

Link to documentation: neuraxle.steps.numpy.OneHotEncoder文档链接： neuraxle.steps.numpy.OneHotEncoder

Answer 20

def one_hot(n, class_num, col_wise=True):
  a = np.eye(class_num)[n.reshape(-1)]
  return a.T if col_wise else a

# Column for different hot
print(one_hot(np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 9, 9, 9, 9, 8, 7]), 10))
# Row for different hot
print(one_hot(np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 9, 9, 9, 9, 8, 7]), 10, col_wise=False))

Answer 21

I find the easiest solution combines np.take<\/code> and np.eye<\/code>我发现最简单的解决方案结合了np.take<\/code>和np.eye<\/code>

def one_hot(x, num_classes: int):
  return np.take(np.eye(num_classes), x, axis=0)

Answer 22

If using tensorflow , there is one_hot() :如果使用tensorflow ，则有one_hot() ：

import tensorflow as tf
import numpy as np

a = np.array([1, 0, 3])
depth = 4
b = tf.one_hot(a, depth)
# <tf.Tensor: shape=(3, 3), dtype=float32, numpy=
# array([[0., 1., 0.],
#        [1., 0., 0.],
#        [0., 0., 0.]], dtype=float32)>

"将索引数组转换为 1-hot 编码的 numpy 数组"

问题描述

22 个解决方案

解决方案1
461 已采纳 2015-04-23 18:30:15

解决方案2
212 2016-05-19 12:35:50

解决方案3
47 2017-11-27 11:13:21

解决方案4
39 2018-03-11 07:41:09

解决方案5
32 2018-04-12 07:14:13

解决方案6
30 2017-02-16 02:15:32

解决方案7
6 2019-05-26 15:29:27

解决方案8
6 2020-04-10 23:27:13

解决方案9
5 2016-09-14 00:02:01

解决方案10
2 2016-10-11 22:26:38

解决方案11
2 2018-01-17 14:08:48

解决方案12
1 2018-01-25 13:10:05

解决方案13
1 2018-08-30 06:36:17

解决方案14
1 2018-11-03 10:17:49

解决方案15
0 2018-01-06 18:12:30

解决方案16
0 2018-06-05 10:04:54

解决方案17
0 2018-07-30 01:04:32

解决方案18
0 2019-02-27 18:33:14

解决方案19
0 2019-12-10 07:39:15

Using a Neuraxle pipeline step:使用Neuraxle流水线步骤：

解决方案20
0 2021-05-09 20:53:24

解决方案21
0 2022-02-03 00:05:24

解决方案22
-1 2020-10-20 11:11:19

"将索引数组转换为 1-hot 编码的 numpy 数组"

问题描述

22 个解决方案

解决方案1 461 已采纳 2015-04-23 18:30:15

解决方案2 212 2016-05-19 12:35:50

解决方案3 47 2017-11-27 11:13:21

解决方案4 39 2018-03-11 07:41:09

解决方案5 32 2018-04-12 07:14:13

解决方案6 30 2017-02-16 02:15:32

解决方案7 6 2019-05-26 15:29:27

解决方案8 6 2020-04-10 23:27:13

解决方案9 5 2016-09-14 00:02:01

解决方案10 2 2016-10-11 22:26:38

解决方案11 2 2018-01-17 14:08:48

解决方案12 1 2018-01-25 13:10:05

解决方案13 1 2018-08-30 06:36:17

解决方案14 1 2018-11-03 10:17:49

解决方案15 0 2018-01-06 18:12:30

解决方案16 0 2018-06-05 10:04:54

解决方案17 0 2018-07-30 01:04:32

解决方案18 0 2019-02-27 18:33:14

解决方案19 0 2019-12-10 07:39:15

Using a Neuraxle pipeline step:使用Neuraxle流水线步骤：

解决方案20 0 2021-05-09 20:53:24

解决方案21 0 2022-02-03 00:05:24

解决方案22 -1 2020-10-20 11:11:19

解决方案1
461 已采纳 2015-04-23 18:30:15

解决方案2
212 2016-05-19 12:35:50

解决方案3
47 2017-11-27 11:13:21

解决方案4
39 2018-03-11 07:41:09

解决方案5
32 2018-04-12 07:14:13

解决方案6
30 2017-02-16 02:15:32

解决方案7
6 2019-05-26 15:29:27

解决方案8
6 2020-04-10 23:27:13

解决方案9
5 2016-09-14 00:02:01

解决方案10
2 2016-10-11 22:26:38

解决方案11
2 2018-01-17 14:08:48

解决方案12
1 2018-01-25 13:10:05

解决方案13
1 2018-08-30 06:36:17

解决方案14
1 2018-11-03 10:17:49

解决方案15
0 2018-01-06 18:12:30

解决方案16
0 2018-06-05 10:04:54

解决方案17
0 2018-07-30 01:04:32

解决方案18
0 2019-02-27 18:33:14

解决方案19
0 2019-12-10 07:39:15

解决方案20
0 2021-05-09 20:53:24

解决方案21
0 2022-02-03 00:05:24

解决方案22
-1 2020-10-20 11:11:19