简体   繁体   中英

One hot encoding a numpy array

I'm working on an image classification problem where I got the train labels as a 1-D numpy array, like [1,2,3,2,2,2,4,4,3,1] . I used

train_y = []
for label in train_label:
    if label == 0:
        train_y.append([1,0,0,0])
    elif label == 1:
        train_y.append([0,1,0,0])
    elif label == 2:
        train_y.append([0,0,1,0])
    elif label == 3:
        train_y.append([0,0,0,1])

Also I need the len(one_hot_array) = set(train_labels) , but this is not a good method. Please recommend a good method to do so.

It's always a good habit to use numpy for arrays. np.unique() determins the labels you have in train_labels . ix is an array of indices. np.nonzero() gives the indices of train_lables where train_labels == unique_tl[iy] .

import numpy as np

train_labels = np.array([2,5,8,2,5,8])
unique_tl = np.unique(train_labels)

NL = len(train_labels)               # how many data , 6
nl = len(unique_tl)                  # how many labels, 3   
target = np.zeros((NL,nl),dtype=int)

for iy in range(nl):
    ix = np.nonzero(train_labels == unique_tl[iy]) 
    target[ix,iy] = 1

gives

target
array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1],
       [1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]])

I'll think about a possibility to eliminate the for-loop.

If [2,5,8] is meant as part of [0,1,2,3,4,5,6,7,8], then you can use this answer

make a vector of zeros, and set only one value to 1

target = np.zeros(num_classes)
target[label] = 1
train_y.append(target)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM