减小 CSV 文件中的样本量但处理的数据相同后的 Val_loss "Nan"

Question

I have tried the following example , which works very well.我试过下面的例子，效果很好。 In the example file, the values are stored in 10-minute intervals.在示例文件中，这些值以 10 分钟的间隔存储。 However, since I need to bring in more values that are just hourly available, I deleted from the database all values that were not at a full hour.但是，由于我需要引入更多每小时可用的值，因此我从数据库中删除了所有非整小时的值。 Say: There are now only 1/6 as many rows and three more columns that are not selected in this test run so far.说：到目前为止，在此测试运行中未选择的行数只有原来的 1/6，另外还有 3 列。

If I now execute the code exactly as before, the following step will return如果我现在像以前一样执行代码，下面的步骤将返回

path_checkpoint = "model_checkpoint.h5"
es_callback = keras.callbacks.EarlyStopping(monitor="val_loss", min_delta=0, patience=5)

modelckpt_callback = keras.callbacks.ModelCheckpoint(
    monitor="val_loss",
    filepath=path_checkpoint,
    verbose=1,
    save_weights_only=True,
    save_best_only=True,
)

history = model.fit(
    dataset_train,
    epochs=epochs,
    validation_data=dataset_val,
    callbacks=[es_callback, modelckpt_callback],
)

always the message the val_loss error for each epoch:总是每个时期的 val_loss 错误消息：

Epoch 1/10
871/871 [==============================] - ETA: 0s - loss: 0.4529
Epoch 1: val_loss did not improve from inf
871/871 [==============================] - 288s 328ms/step - loss: 0.4529 - val_loss: nan

I think it is related to this previous code block,我认为这与之前的代码块有关，

split_fraction = 0.715
train_split = int(split_fraction * int(df.shape[0]))
step = 6

past = 720
future = 72
learning_rate = 0.001
batch_size = 256
epochs = 10


def normalize(data, train_split):
    data_mean = data[:train_split].mean(axis=0)
    data_std = data[:train_split].std(axis=0)
    return (data - data_mean) / data_std

where the original author specifies that only every sixth record should be used.原作者指定只应使用每六条记录。 Since I already removed every sixth record before, it should now use all records.由于我之前已经删除了每六条记录，它现在应该使用所有记录。 Therefore I already tried to set step = 1, but without success.因此我已经尝试设置 step = 1，但没有成功。 It still comes with the message that val_loss did not improve from inf它仍然带有val_loss 没有从 inf 改进的消息

Does anyone know what else I would need to adjust to satisfy the code that I now have only one-sixth as many rows as originally thought?有谁知道我还需要调整什么来满足我现在的行数只有最初想象的六分之一的代码？ The result should initially end up with the same values as in the example because I have not yet used the new data.结果最初应该与示例中的值相同，因为我尚未使用新数据。

Answer 1

The Issue was inside the.csv file.问题在 .csv 文件中。 In two of the 300000 rows, the date was formatted as 25.10.在 300000 行中的两行中，日期格式为 25.10。 18 , but in the other rows, the time was 25.10. 18 ，但在其他行中，时间是 25.10。 2018 . 2018 年。

After editing the format to a consistent dd.mm.yyyy, the val_loss decreased as expected.将格式编辑为一致的 dd.mm.yyyy 后，val_loss 如预期下降。

If you are facing the same issue, this code can help you to find wrong formatted rows:如果您遇到同样的问题，此代码可以帮助您找到格式错误的行：

date_time = pd.to_datetime(df.pop('Date Time'), format='%d.%m.%Y %H:%M:%S')

减小 CSV 文件中的样本量但处理的数据相同后的 Val_loss "Nan"

问题描述

1 个解决方案

解决方案1
0 2022-09-05 20:06:15

减小 CSV 文件中的样本量但处理的数据相同后的 Val_loss "Nan"

问题描述

1 个解决方案

解决方案1 0 2022-09-05 20:06:15

解决方案1
0 2022-09-05 20:06:15