简体   繁体   English

没有 for 循环的 Keras 中的单热编码

[英]One-hot encodings in Keras without for loops

I want to generate one-hot encodings for a list of sequences.我想为序列列表生成单热编码。

def encode_output(sequences, vocab_size):
  y = np.zeros([sequences.shape[0], sequences.shape[1], vocab_size], dtype='int16')
  for i in range(sequences.shape[0]):
    y[i] = keras.utils.to_categorical(sequences[i], num_classes=vocab_size, dtype='int16')
  return y

Sequences is a 2-D numpy array序列是一个二维 numpy 数组

array([[  23,    4,  563, ...,    0,    0,    0],
       [3480,    3,   86, ...,    0,    0,    0],
       [   9,  930,    6, ...,    0,    0,    0],
       ...,
       [ 507, 1408,    0, ...,    0,    0,    0],
       [4447,   13,  642, ...,    0,    0,    0],
       [   1,  195, 2618, ...,    0,    0,    0]], dtype=int32)

My code works fine, but maybe there is a way to make it without for loop?我的代码工作正常,但也许有一种方法可以不用 for 循环?

You can simply use array-assignment -您可以简单地使用array-assignment -

def encode_vectorized(a, n, dtype=int):
    out = np.zeros(a.shape + (n,), dtype=dtype)
    np.put_along_axis(out, a[...,None], 1, axis=-1)
    return out

For OHE exercises, I always use: pd.get_dummies对于 OHE 练习,我总是使用: pd.get_dummies

Here is a simple example:这是一个简单的例子:

import pandas as pd
s = pd.Series(list('abca'))

pd.get_dummies(s)
   a  b  c
0  1  0  0
1  0  1  0
2  0  0  1
3  1  0  0

Resource:资源:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_dummies.html https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_dummies.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM