Word2Vec：使用 Gensim 上传预训练的 word2vec 文件时收到错误

Question

I receive an error when trying to upload a pre-trained word2vec file (compiled with fasttext) using Gensim.尝试使用 Gensim 上传预训练的 word2vec 文件（使用 fasttext 编译）时收到错误消息。 File has '.vec' extension and can be found here: http://89.38.230.23/word_embeddings/we/corola.300.20.vec.zip文件具有“.vec”扩展名，可以在这里找到：http: //89.38.230.23/word_embeddings/we/corola.300.20.vec.zip

What I've tried so far: Option 1: KeyedVectors from gensim.models Option 2: FastText wrapper到目前为止我尝试过的：选项 1：来自 gensim.models 的 KeyedVectors 选项 2：FastText 包装器

#Option 1
    from gensim.models import KeyedVectors
    model = KeyedVectors.load_word2vec_format('Word_embeddings/corola.300.20.vec', binary=True)
######

#Option 2
    from gensim.models.wrappers import FastText
    model = FastText.load_word2vec_format('Word_embeddings/corola.300.20.vec')

Error option 1: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9b in position 0: invalid start byte错误选项 1：UnicodeDecodeError：'utf-8' 编解码器无法解码位置 0 中的字节 0x9b：起始字节无效

Deprecation Error option 2: DeprecationWarning: Deprecated.弃用错误选项 2：弃用警告：已弃用。 Use gensim.models.KeyedVectors.load_word2vec_format instead.使用 gensim.models.KeyedVectors.load_word2vec_format 代替。

I need the correct method to successfully upload the word2vec file, using gensim.我需要正确的方法来使用 gensim 成功上传 word2vec 文件。

Thank you.谢谢你。

Answer 1

Sometimes, it's just fine to use the unicode_errors='ignore' parameter, since there can be errors in the word embedding file.有时，使用unicode_errors='ignore'参数就可以了，因为词嵌入文件中可能存在错误。 Just try:你试一试：

model = KeyedVectors.load_word2vec_format('Word_embeddings/corola.300.20.vec', binary=True, unicode_errors='ignore')

Word2Vec：使用 Gensim 上传预训练的 word2vec 文件时收到错误

问题描述

1 个解决方案

解决方案1
1 2019-06-25 23:28:00

Word2Vec：使用 Gensim 上传预训练的 word2vec 文件时收到错误

问题描述

1 个解决方案

解决方案1 1 2019-06-25 23:28:00

解决方案1
1 2019-06-25 23:28:00