如何在 tf.Dataset 上调整 TextVectorization 层

Question

I load my dataset like this:我像这样加载我的数据集：

self.train_ds = tf.data.experimental.make_csv_dataset(
            self.config["input_paths"]["data"]["train"],
            batch_size=self.params["batch_size"],
            shuffle=False,
            label_name="tags",
            num_epochs=1,
        )

My TextVectorization layer looks like this:我的 TextVectorization 层如下所示：

vectorizer = tf.keras.layers.TextVectorization(
            standardize=code_standaridization,
            split="whitespace",
            output_mode="int",
            output_sequence_length=params["input_dim"],
            max_tokens=100_000,
        )

And I thought this is going to be enough:我认为这就足够了：

vectorizer.adapt(data_provider.train_ds)

But its not, I have this error:但它不是，我有这个错误：

TypeError: Expected string, but got Tensor("IteratorGetNext:0", shape=(None, None), dtype=string) of type 'Tensor'.

Can I somehow adapt my vectorizer on TensorFlow dataset?我能否以某种方式在 TensorFlow 数据集上调整我的矢量化器？

Answer 1

Most probably the issue is that you use batch_size in your train_ds without .unbatch() when you try to adapt.最有可能的问题是，当您尝试适应时，您在没有.unbatch()的train_ds中使用了batch_size 。

You have to do:你必须做：

vectorizer.adapt(train_ds.unbatch().map(lambda x, y: x).batch(BATCH_SIZE))

The .unbatch() solves the error that you are currently seeing and the .map() is needed because the TextVectorization layer operates on batches of strings so you need to get them from your dataset .unbatch()解决了您当前看到的错误并且需要.map() ，因为 TextVectorization 层对字符串批次进行操作，因此您需要从数据集中获取它们

如何在 tf.Dataset 上调整 TextVectorization 层

问题描述

1 个解决方案

解决方案1
0 2023-01-18 08:21:37

如何在 tf.Dataset 上调整 TextVectorization 层

问题描述

1 个解决方案

解决方案1 0 2023-01-18 08:21:37

解决方案1
0 2023-01-18 08:21:37