简体   繁体   中英

What does reset actually mean in Tensorflow 2 dataset?

I'm following tensorflow 2 Keras documentation . My model looks like this:

train_dataset = tf.data.Dataset.from_tensor_slices((np.array([_my_cus_func(i) for i in X_train]), y_train))
train_dataset = train_dataset.map(lambda vals,lab: _process_tensors(vals,lab), num_parallel_calls=4)
train_dataset = train_dataset.shuffle(buffer_size=10000)
train_dataset = train_dataset.batch(64,drop_remainder=True)
train_dataset = train_dataset.prefetch(1)
model=get_compiled_model()
model.fit(train_dataset, epochs=100)

The documentation says

Note that the Dataset is reset at the end of each epoch, so it can be reused of the next epoch.

If you want to run training only on a specific number of batches from this Dataset, you can pass the steps_per_epoch argument, which specifies how many training steps the model should run using this Dataset before moving on to the next epoch.

If you do this, the dataset is not reset at the end of each epoch, instead we just keep drawing the next batches. The dataset will eventually run out of data (unless it is an infinitely-looping dataset).

What does the reset actually mean? Will tensorflow read data from tensor slices after every epoch? or it only reshuffles and runs map function? I want tensorflow to read data from numpy after epoch and run _my_cus_func . I can rather pass _my_cus_func on dataset map or apply api , but I'm more comfortable in doing this on python list or numpy array.

In this context, reset means start iterating over dataset from scratch. In your particular case, code lacks repeat() function. So, if you specify steps_per_epoch parameter like this

model.fit(train_dataset, steps_per_epoch=N, epochs=100)

It will iterate over the dataset for N steps, if N is less than actual number of examples, it will terminate training. If N is larger, it will finish one epoch, but still terminates when runs out of data . If you add repeat,

train_dataset = train_dataset.shuffle(buffer_size=10000).repeat()

It will start new cycle over dataset when actual number of examples is reached, not when new epoch starts.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM