简体   繁体   English

tf.data.Dataset 的规范化层

[英]Normalisation layer for tf.data.Dataset

I am trying to improve the Tensorflow tutorial on Time series forecasting .我正在尝试改进有关时间序列预测的 Tensorflow 教程。 The code is quite long, but my doubt regards only a small part of it.代码很长,但我的怀疑只是其中的一小部分。 In the tutorial the data is normalized is the usual way: it is demeaned and standardized using the mean and standard deviation of the train set.在本教程中,数据归一化是通常的方式:使用训练集的均值和标准差对其进行贬低和标准化。

train_mean = train_df.mean()
train_std = train_df.std()

train_df = (train_df - train_mean) / train_std
val_df = (val_df - train_mean) / train_std
test_df = (test_df - train_mean) / train_std

Then a tf.data.Dataset is generated to feed the data to the algorithms:然后生成一个tf.data.Dataset以将数据提供给算法:

def make_dataset(self, data):

  data = np.array(data, dtype=np.float32)
  ds = tf.keras.utils.timeseries_dataset_from_array(data=data, targets=None, sequence_length=self.total_window_size, sequence_stride=1, shuffle=True)
  ds = ds.map(self.split_window)

  return ds

This function is a method of a class that is too long to be reported here.这个function是一个class的方法,篇幅太长,这里就不报道了。 What matters it that it returns tuples of inputs and labels:重要的是它返回输入和标签的元组:

for example_inputs, example_labels in my_class_instance.train.take(1):
  print(f'Inputs shape (batch, time, features): {example_inputs.shape}')
  print(f'Labels shape (batch, time, features): {example_labels.shape}')

Returns:回报:

Inputs shape (batch, time, features): (32, 6, 19) 
Labels shape (batch, time, features): (32, 1, 1)

The problem with this approach is that both the loss function and the metrics refer to standardized variables (including the target variable) rather than the actual values that we are trying to predict.这种方法的问题在于,损失 function 和指标都指的是标准化变量(包括目标变量),而不是我们试图预测的实际值。 To solve this problem, I would like to leave the features (and hence the target variable) undstandardized and instead introduce a feature normalization layer in the machine learning models.为了解决这个问题,我想保留未标准化的特征(以及目标变量),而是在机器学习模型中引入特征归一化层。 I thought of using something like this:我想过使用这样的东西:

normalizer = tf.keras.layers.Normalization(axis=-1)
normalizer.adapt(np.array(train_features))
model.add(normalizer)

My question is: how can I add add such a normalization layer so that it standardizes only the features and not the labels?我的问题是:如何添加这样的规范化层,使其仅标准化特征而不标准化标签?

I have already achieved a step, which is removing the batches from the dataset so that if I wanted to obtain the same result I would need to specify that I am batching:我已经完成了一个步骤,即从数据集中删除批次,这样如果我想获得相同的结果,我需要指定我正在批处理:

for example_inputs, example_labels in my_class_instance.train.batch(32).take(1):
  print(f'Inputs shape (batch, time, features): {example_inputs.shape}')
  print(f'Labels shape (batch, time, features): {example_labels.shape}')

Returns:回报:

Inputs shape (batch, time, features): (32, 6, 19) 
Labels shape (batch, time, features): (32, 1, 1)

You should be able to do something like this:你应该能够做这样的事情:

normalizer = tf.keras.layers.Normalization(axis=-1)
normalizer.adapt(my_class_instance.train.map(lambda x, y: x))
model.add(normalizer)

where x represents your features and y your labels.其中x代表您的特征, y代表您的标签。 And just as a reminder : 提醒一下:

Calling adapt() on a Normalization layer is an alternative to passing in mean and variance arguments during layer construction.在归一化层上调用 adapt() 是在层构建期间传入均值和方差 arguments 的替代方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM