将标签编码器实现为 TensorFlow 预处理层

Question

From my understanding from SKLearn's documentation , LabelEncoder in SKLearn encodes values between 0 and the number of classes subtracted by 1 (ie n_classes - 1).根据我对 SKLearn 文档的理解，SKLearn 中的LabelEncoder对 0 和减去 1 的类数之间的值进行编码（即n_classes - 1）。

I wanted to use something similar as a part of a Tensorflow preprocessing operation to avoid using SKLearn for a package.我想使用类似的东西作为 Tensorflow 预处理操作的一部分，以避免将 SKLearn 用于包。 For example, I understand the preprocessing layer provides APIs for OneHot encoding and Categorical Encoding easily as follows:例如，我理解预处理层提供了用于 OneHot 编码和分类编码的 API，如下所示：

tf.keras.layers.CategoryEncoding(
    num_tokens=None, output_mode='multi_hot', sparse=False, **kwargs
)

Is there any way to use LabelEncoder by certain arguments in the CategoryEncoding API, or do I have to define a brand new pre-processing layer using the abstract base class template provided in the Tensorflow documentations?有什么方法可以通过CategoryEncoding API 中的某些参数使用 LabelEncoder，还是我必须使用 Tensorflow 文档中提供的抽象基类模板定义一个全新的预处理层？

If so, is there any possible reference on how I can write my own class for using LabelEncoder as a Tensorflow layer?如果是这样，是否有任何关于如何编写自己的类以将LabelEncoder用作 Tensorflow 层的参考？

Answer 1

IIUC, you just need sparse integer labels. IIUC，你只需要稀疏的整数标签。 So, maybe try something simple and naive first:所以，也许先尝试一些简单而天真的事情：

classes = ['fish1', 'fish2', 'fish3']

data = ['fish1', 'fish2', 'fish3', 'fish2', 'fish3', 'fish1']

class_indices = dict(zip(classes, range(len(classes))))
labels = list(map(class_indices.get, data))

print(labels)

[0, 1, 2, 1, 2, 0]

Or with Tensorflow , you can use StaticHashTable :或者使用Tensorflow ，您可以使用StaticHashTable ：

import tensorflow as tf

classes = ['fish1', 'fish2', 'fish3']
data = tf.constant(['fish1', 'fish2', 'fish3', 'fish2', 'fish3', 'fish1'])

table = tf.lookup.StaticHashTable(
    tf.lookup.KeyValueTensorInitializer(tf.constant(classes), tf.range(len(classes))),
    default_value=-1)

label_encoder = tf.keras.layers.Lambda(lambda x: table.lookup(x))

print(label_encoder(data))

tf.Tensor([0 1 2 1 2 0], shape=(6,), dtype=int32)

将标签编码器实现为 TensorFlow 预处理层

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-05-16 05:55:49

将标签编码器实现为 TensorFlow 预处理层

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-05-16 05:55:49

解决方案1
1 已采纳 2022-05-16 05:55:49