我如何在 google colab 中解压缩预训练的 word2vec？

Question

I am trying to use pretrained word vectors of google, GoogleNews-vectors-negative300.bin.gz, in colab but i don't know how to unzip the file.我正在尝试在 colab 中使用谷歌的预训练词向量 GoogleNews-vectors-negative300.bin.gz，但我不知道如何解压缩文件。

import gzip
f=gzip.open('gdrive/My Drive/Colab Notebooks/LAST/we/GoogleNews-vectors-negative300.bin.gz', 'rt')
file_content=f.read()

I tried to read the file directly using gzip but got the error:我尝试使用 gzip 直接读取文件，但出现错误：

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x94 in position 19: invalid start byte.

Answer 1

from gensim.models import KeyedVectors
word2vec = KeyedVectors.load_word2vec_format(EMBEDDING_FILE, binary=True)
x = word2vec.word_vec("test")

x would contain the vector for the word test x 将包含单词test的向量

Snippet to download the word2Vec model:下载 word2Vec 模型的代码段：

EMBEDDING_FILE = '/root/input/GoogleNews-vectors-negative300.bin.gz'
!wget -P /root/input/ -c "https://s3.amazonaws.com/dl4j-distribution/GoogleNews-vectors-negative300.bin.gz"

Reference: A publicly available Google Colab Notebook参考：公开可用的Google Colab Notebook

Answer 2

There are two possible solutions (both of which I've tried. Even I'm working on the same problem):有两种可能的解决方案（我都尝试过。即使我正在解决同样的问题）：

Use encoding='iso8859'使用encoding='iso8859'
Use KeyedVectors.load_word2vec_format(path of your file).使用KeyedVectors.load_word2vec_format(path of your file).

Answer 3

你可以使用这个：

!gunzip ./GoogleNews-vectors-negative300.bin.gz

我如何在 google colab 中解压缩预训练的 word2vec？

问题描述

3 个解决方案

解决方案1
3 2019-11-21 05:24:31

解决方案2
0 2019-04-03 03:35:20

解决方案3
0 2020-09-21 10:15:46

我如何在 google colab 中解压缩预训练的 word2vec？

问题描述

3 个解决方案

解决方案1 3 2019-11-21 05:24:31

解决方案2 0 2019-04-03 03:35:20

解决方案3 0 2020-09-21 10:15:46

解决方案1
3 2019-11-21 05:24:31

解决方案2
0 2019-04-03 03:35:20

解决方案3
0 2020-09-21 10:15:46