简体   繁体   中英

Word2Vec: Error received at uploading a pre-trained word2vec file using Gensim

I receive an error when trying to upload a pre-trained word2vec file (compiled with fasttext) using Gensim. File has '.vec' extension and can be found here: http://89.38.230.23/word_embeddings/we/corola.300.20.vec.zip

What I've tried so far: Option 1: KeyedVectors from gensim.models Option 2: FastText wrapper

#Option 1
    from gensim.models import KeyedVectors
    model = KeyedVectors.load_word2vec_format('Word_embeddings/corola.300.20.vec', binary=True)
######

#Option 2
    from gensim.models.wrappers import FastText
    model = FastText.load_word2vec_format('Word_embeddings/corola.300.20.vec')

Error option 1: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9b in position 0: invalid start byte

Deprecation Error option 2: DeprecationWarning: Deprecated. Use gensim.models.KeyedVectors.load_word2vec_format instead.

I need the correct method to successfully upload the word2vec file, using gensim.

Thank you.

Sometimes, it's just fine to use the unicode_errors='ignore' parameter, since there can be errors in the word embedding file. Just try:

model = KeyedVectors.load_word2vec_format('Word_embeddings/corola.300.20.vec', binary=True, unicode_errors='ignore')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM