Why does the loss value of a Word2Vec model become zero after a few epochs?

Question

I was using gensim to train a Word2Vec model with 'text8' as the corpus. But I just found that the loss became 0 after a few epochs and I didn't know what to do. Could you please help me and find what was wrong?

code

The hidden part of my code is: callbacks=[Callback()]

And here is the result:

vector_size=100, learning_rate=0.01

Loss after epoch 0:-60790632.0

Loss after epoch 1:15678536.0

Loss after epoch 2:15936896.0

Loss after epoch 3:15933712.0

Loss after epoch 4:13241488.0

Loss after epoch 5:0.0

Loss after epoch 6:0.0

...

Loss after epoch 28:0.0

Loss after epoch 29:0.0

Running time: 1412.9658317565918 seconds

Here is my code:

from gensim.models.keyedvectors import KeyedVectors
from gensim.models import word2vec
import logging

logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
sentences = word2vec.Text8Corpus("text8")   # loading the corpus



from gensim.models.callbacks import CallbackAny2Vec
loss_list = []
loss_list.append(0)
     
class Callback(CallbackAny2Vec):
    def __init__(self):
        self.epoch = 0
    
    def on_epoch_end(self, model):
        loss = model.get_latest_training_loss()
        now_loss = loss - loss_list[-1]
        loss_list.append(loss)
        print('Loss after epoch {}:{}'.format(self.epoch, now_loss))
        self.epoch = self.epoch + 1



from gensim.models import KeyedVectors,word2vec,Word2Vec
import time

start_time = time.time()

model = word2vec.Word2Vec(sentences,hs=1,sg=1, compute_loss=True, epochs=30, callbacks=[Callback()])

end_time = time.time()

print('Running time: %s seconds' % (end_time - start_time))

Answer 1

Note that the running-loss reporting in Gensim has a number of known problems & inconsistencies. You can read an overview of the problems in the project's open issue #2617 .

You seem to be specifically hitting issue #2735 , where two bad choices in the code – to use a too-narrow floating-point representation, and to keep tallying loss into an entire-training-session sum that grows ever-larger – interact to make the running tally essentially impervious to tiny incremental updates. That creates the illusion that no additional loss accrued during an epoch, though in fact there was plenty – it just didn't tally into the running-total.

The workaround in that issue's discussion may help you: at the end of each epoch, after you've grabbed the number you need, reset the model's internal value to 0.0 :

model.running_training_loss = 0.0

Note that after doing this, you won't need to be as fancy about calculating the latest-epoch loss in your own code, because you've changed the model to only ever report the latest epoch.

But beware that other flaky things in the code may still render reported values less reliable/interesting than they should be, especially if using very-large training epochs (not a problem with the tiny text8 ) or many worker threads.

And further: be careful about over-relying on the loss numbers for anything other than judging whether the model will benefit from more training epochs. (It's not the case that a model with a lower loss is necessarily better than another – it might just be overfit, a risk with larger models and smaller data. It's only the case that when loss stops improving, the model is as good as it can get given its state/parameters/training-data.)

Separately:

By supplying hs=1 without also turning off the default negative-sampling with negative=0 , you've created a non-standard hybrid model that's being optimized for both a hierarchical-softmax output layer and a negative-sampling output-layer. You probably don't want to do that; it'll slow training for little benefit. Usually one, or the other, is better.
You screenshotted code includes some other non-default parameters like alpha=0.01 (an odd choice) and min_count=40 (which might be appropriate for a large dataset, but against tiny text8 might leave you with a very-small working vocabulary that leaves the 100-dimensional model oversized). In general, while there's nothing guaranteed-good about the model's defaults, you should only change them if you have a good theory for why your goals/data might benefit from other values, and also, ideally, a way to evaluate your results to check if such tinkering is helping or hurting.

Why does the loss value of a Word2Vec model become zero after a few epochs?

Question

1 answers

solution1
0 2022-09-23 22:58:52

Why does the loss value of a Word2Vec model become zero after a few epochs?

Question

1 answers

solution1 0 2022-09-23 22:58:52

solution1
0 2022-09-23 22:58:52