简体   繁体   中英

I used word2vec in deeplearning4j to train word vectors, but those vectors are unstable

1.I use IntelliJ IDEA build a maven project,code is as follows:

    System.out.println("Load data....");
    SentenceIterator iter = new LineSentenceIterator(new File("/home/zs/programs/deeplearning4j-master/dl4j-test-resources/src/main/resources/raw_sentences.txt"));
    iter.setPreProcessor(new SentencePreProcessor() {
        @Override

            return sentence.toLowerCase();
        }
    });
    System.out.println("Build model....");
    int batchSize = 1000;
    int iterations = 30;
    int layerSize = 300;
    com.sari.Word2Vec vec= new  com.sari.Word2Vec.Builder()
            .batchSize(batchSize) //# words per minibatch.
            .sampling(1e-5) // negative sampling. drops words out
            .minWordFrequency(5) //
            .useAdaGrad(false) //
            .layerSize(layerSize) // word feature vector size
            .iterations(iterations) // # iterations to train
            .learningRate(0.025) //
            .minLearningRate(1e-2) // learning rate decays wrt # words. floor learning
            .negativeSample(10) // sample size 10 words
            .iterate(iter) //
            .tokenizerFactory(tokenizer)
            .build();
    vec.fit();
    System.out.println("Evaluate model....");
    double cosSim = vec.similarity("day" , "night");
    System.out.println("Similarity between day and night: "+cosSim);

This code is reference the word2vec in deeplearning4j,but the result is unstable.The results of each experiment were very different.for example, with the cosine value of the similarity between 'day'and 'night', sometimes the result is as high as 0.98, sometimes as low as 0.4?

Here are the results of two experiments

Evaluate model....
Similarity between day and night: 0.706292986869812

Evaluate model....
Similarity between day and night: 0.5550910234451294

Why the result like this.Because I have just started learning word2vec, there are a lot of knowledge is not understood, I hope that seniors can help me,thanks!

You have set the following line:

.minLearningRate(1e-2) // learning rate decays wrt # words. floor learning

But that is an extremely high learning rate. The high learning rate causes the model to not 'settle' in any state, but instead a few updates significantly changes the learned representation. That is not a problem during the first few updates, but bad for convergence.

Solution: Allow learning rate to decay. You can leave this line out completely, or if you must you can use a more appropriate value, such as 1e-15

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM