簡體   English   中英

對於無限數據集,每個時期中使用的數據是否相同?

[英]For an infinite dataset, is the data used in each epoch the same?

在tensorflow中,假設我有一個來自generator的數據集:

dataset = tf.data.Dataset.from_generator(gen...)

並且此生成器生成無限的重復數據(就像無限的非循環小數)一樣。

model.fit(dataset, steps_per_epoch=10000, epochs=5)

現在,在這5個訓練時期內,使用的數據是否相同? 即總是從發生器的前10000個項目? 而不是第1階段的0-9999,第2階段的10000-19999,等等。

initial_epoch參數呢? 如果將其設置為1,將從第10000個項目開始訓練模型嗎?

model.fit(dataset, steps_per_epoch=10000, epochs=5, initial_epoch=1)

更新:此簡單的測試表明,每次調用model.fit()時都會重置數據集

def gen():
    i = 1
    while True:
        yield np.array([[i]]), np.array([[0]])
        i += 1

ds = tf.data.Dataset.from_generator(gen, output_types=(tf.int32, tf.int32)).batch(3)

x = Input(shape=(1, 1))
model = Model(inputs=x, outputs=x)

model.compile('adam', loss=lambda true, pred: tf.reduce_mean(pred))
for i in range(10):
    model.fit(ds, steps_per_epoch=5, epochs=1)

輸出:

1/5 [=====>........................] - ETA: 0s - loss: 2.0000
5/5 [==============================] - 0s 9ms/step - loss: 8.0000
1/5 [=====>........................] - ETA: 0s - loss: 2.0000
5/5 [==============================] - 0s 2ms/step - loss: 8.0000
1/5 [=====>........................] - ETA: 0s - loss: 2.0000
5/5 [==============================] - 0s 2ms/step - loss: 8.0000
1/5 [=====>........................] - ETA: 0s - loss: 2.0000
5/5 [==============================] - 0s 2ms/step - loss: 8.0000
1/5 [=====>........................] - ETA: 0s - loss: 2.0000
5/5 [==============================] - 0s 2ms/step - loss: 8.0000
1/5 [=====>........................] - ETA: 0s - loss: 2.0000
5/5 [==============================] - 0s 2ms/step - loss: 8.0000
1/5 [=====>........................] - ETA: 0s - loss: 2.0000
5/5 [==============================] - 0s 2ms/step - loss: 8.0000
1/5 [=====>........................] - ETA: 0s - loss: 2.0000
5/5 [==============================] - 0s 2ms/step - loss: 8.0000
1/5 [=====>........................] - ETA: 0s - loss: 2.0000
5/5 [==============================] - 0s 2ms/step - loss: 8.0000
1/5 [=====>........................] - ETA: 0s - loss: 2.0000
5/5 [==============================] - 0s 2ms/step - loss: 8.0000

1次通話中有5個紀元:

model.fit(ds, steps_per_epoch=5, epochs=5)

輸出:

Epoch 1/5
1/5 [=====>........................] - ETA: 0s - loss: 2.0000
5/5 [==============================] - 0s 9ms/step - loss: 8.0000
Epoch 2/5
1/5 [=====>........................] - ETA: 0s - loss: 17.0000
5/5 [==============================] - 0s 2ms/step - loss: 23.0000
Epoch 3/5
1/5 [=====>........................] - ETA: 0s - loss: 32.0000
5/5 [==============================] - 0s 2ms/step - loss: 38.0000
Epoch 4/5
1/5 [=====>........................] - ETA: 0s - loss: 47.0000
5/5 [==============================] - 0s 2ms/step - loss: 53.0000
Epoch 5/5
1/5 [=====>........................] - ETA: 0s - loss: 62.0000
5/5 [==============================] - 0s 2ms/step - loss: 68.0000

不,使用的數據不同。 keras使用steps_per_epoch確定每個epoch的長度(因為生成器沒有長度),因此它知道何時結束訓練(或調用檢查點指針等)。

initial_epoch是顯示給紀元的數字,當您要從檢查點重新開始訓練時很有用(請參見fit method ),它與數據迭代無關。

如果將相同的dataset傳遞給model.fit方法,它將在每次函數調用后重置(感謝信息OP)。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM