How do you deal with randomness when evaluating a model?

Question

I'm recently training and comparing the performance of two deep learning models. For now, I use a specific seed only when doing train-test split. However, due to the randomness of the model, even with the same seed the loss of the same model differs everytime. Is it better to set a seed everywhere to perfectly control the result or to keep the randomness? If it's the latter, then should I test the same seed several times more and average the losses or should I choose the best/worst performance of the seed?

Also, I've read some conference papers and they usually evaluate a model by using some number of random seeds and averaging the results, and I wonder how the seeds were chosen. If I want to compare two models, should I test them with the same seeds (eg. seed 0, 1, 2 for both model A and B) or different ones (eg. seed 0, 1, 2 for model A and seed 5, 6, 7 for model B) depending on the results? That means, always choose the best one no matter what? Thank you in advance.

Answer 1

For splitting the data, you should use the same seed for each model so that each model is trained and tested on identical data.

For training the models, it doesn't matter except when you are training two models with identical architecture: if these two models share the same seed, they will give the same results. So, when training, you should use distinct seeds for models of the same architecture.

You can use numpy.random.SeedSequence to help with getting seeds, as using it you can keep track of a single seed but spawn unique random seeds:

import numpy as np
entropy = 100
seed_sequence = np.random.SeedSequence(entropy, pool_size=5)
spawns = seed_sequence.spawn(2)
split_seeds = spawns[0].pool # for splitting train-test
# [179453401, 3816112049, 3806930416, 1196834953, 1391596624]
model_seeds = spawns[1].pool # for training models
# [2353154363,  511151844,    6211548, 1188290456, 3368787154]

Notice you need only keep track of entropy . Set pool_size to however many times you want to repeat the experiment.

As noted in the documentation:

Best practice for achieving reproducible bit streams is to use the default None for the initial entropy, and then use SeedSequence.entropy to log/pickle the entropy for reproducibility

So rather than specifying entropy = 100 on your first run, don't specify it at all and simply save seed_sequence.entropy after your first run.

How do you deal with randomness when evaluating a model?

Question

1 answers

solution1
0 2022-08-13 17:03:29

How do you deal with randomness when evaluating a model?

Question

1 answers

solution1 0 2022-08-13 17:03:29

solution1
0 2022-08-13 17:03:29