简体繁体中英

Is there an alternative to fully loading pre-trained word embeddings in memory?

原文 2019-03-18 21:53:17 4 1 python/ machine-learning/ memory-management/ nlp/ word-embedding

I want to use pre-trained word embeddings in my machine learning model. The word embedings file I have is about 4GB. I currently read the entire file into memory in a dictionary and whenever I want to map a word to its vector representation I perform a lookup in that dictionary.

The memory usage is very high and I would like to know if there is another way of using word embeddings without loading the entire data into memory.

I have recently come across generators in Python. Could they help me reduce the memory usage?

Thank you!

1 answers

What task do you have in mind? If this is a similarity based task, you could simply use the load_word2vec_format method in gensim, this allows you to pass in a limit to the number of vectors loaded. The vectors in something like the Googlenews set are ordered by frequency, this will give you the critical vectors. This also makes sense theoretically as the words with low frequency will usually have relatively bad representations.

load pre-trained word embeddings

Using pre-trained word embeddings in a keras model?

How to query from Glove pre-trained word embeddings?

How to make use of pre-trained word embeddings when training a model in sklearn?

How to load pre-trained word embeddings in keras and output different classes

word2vec: user-level, document-level embeddings with pre-trained model

PyTorch / Gensim - How do I load pre-trained word embeddings?

Word-sense disambiguation based on sets of words using pre-trained embeddings

How to fine-tune pre-trained embeddings in embedding layer?

Tensorflow python not loading the pre-trained model

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question load pre-trained word embeddings Using pre-trained word embeddings in a keras model? How to query from Glove pre-trained word embeddings? How to make use of pre-trained word embeddings when training a model in sklearn? How to load pre-trained word embeddings in keras and output different classes word2vec: user-level, document-level embeddings with pre-trained model PyTorch / Gensim - How do I load pre-trained word embeddings? Word-sense disambiguation based on sets of words using pre-trained embeddings How to fine-tune pre-trained embeddings in embedding layer? Tensorflow python not loading the pre-trained model

Related Tags

Is there an alternative to fully loading pre-trained word embeddings in memory?

Question

1 answers

solution1 1 ACCPTED 2019-03-18 22:10:03

solution1
1 ACCPTED 2019-03-18 22:10:03