如何在 Google colaboratory 上使用 GloVe 词嵌入文件

Question

I have downloaded the data with wget我已经用wget下载了数据

!wget http://nlp.stanford.edu/data/glove.6B.zip
 - ‘glove.6B.zip’ saved [862182613/862182613]

It is saved as zip and I would like to use glove.6B.300d.txt file from the zip file.它保存为 zip，我想使用 zip 文件中的 glove.6B.300d.txt 文件。 What I want to achieve is:我想要实现的是：

embeddings_index = {}
with io.open('glove.6B.300d.txt', encoding='utf8') as f:
    for line in f:
        values = line.split()
        word = values[0]
        coefs = np.asarray(values[1:],dtype='float32')
        embeddings_index[word] = coefs

Of course I am having this error:当然我有这个错误：

IOErrorTraceback (most recent call last)
<ipython-input-47-d07cafc85c1c> in <module>()
      1 embeddings_index = {}
----> 2 with io.open('glove.6B.300d.txt', encoding='utf8') as f:
      3     for line in f:
      4         values = line.split()
      5         word = values[0]

IOError: [Errno 2] No such file or directory: 'glove.6B.300d.txt'

How can I unzip and use that file in my code above on Google colab?我如何在 Google colab 上面的代码中解压缩并使用该文件？

Answer 1

One more way you could do is as follows.您可以做的另一种方法如下。

1. Download the zip file 1.下载压缩文件

!wget http://nlp.stanford.edu/data/glove.6B.zip

post downloading the zip file it is saved in the /content directory of google Collab.下载 zip 文件后，它会保存在 google Collab 的 /content 目录中。

2. Unzip it 2.解压

!unzip glove*.zip

3. Get the exact path of where the embedding vectors are extracted using 3. 使用获取嵌入向量的确切路径

!ls
!pwd

4. Index the vectors 4. 索引向量

print('Indexing word vectors.')

embeddings_index = {}
f = open('glove.6B.100d.txt', encoding='utf-8')
for line in f:
    values = line.split()
    word = values[0]
    coefs = np.asarray(values[1:], dtype='float32')
    embeddings_index[word] = coefs
f.close()

print('Found %s word vectors.' % len(embeddings_index))

5. Fuse with google - drive 5. 与谷歌融合 - 驱动

!pip install --upgrade pip
!pip install -U -q pydrive
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null

!apt-get -y install -qq google-drive-ocamlfuse fuse

from google.colab import auth
auth.authenticate_user()
# Generate creds for the Drive FUSE library.
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}

!mkdir -p drive
!google-drive-ocamlfuse drive

6. Save the indexed vectors to google drive for re-use 6. 将索引向量保存到谷歌驱动器以供重复使用

import pickle
pickle.dump({'embeddings_index' : embeddings_index } , open('drive/path/to/your/file/location', 'wb'))

If you have already downloaded the zip file in the local system, just extract it and upload the required dimension file to google drive -> fuse gdrive -> give the appropriate path and then use it / make an index of it, etc.如果您已经在本地系统中下载了 zip 文件，只需将其解压缩并将所需的维度文件上传到谷歌驱动器 -> 融合 gdrive -> 给出适当的路径，然后使用它/制作它的索引等。

also, another way would be if already downloaded in the local system via code in collab另外，另一种方法是如果已经通过 collab 中的代码下载到本地系统中

from google.colab import files
files.upload()

select the file and use it as in step 3 onwards.选择文件并按照第 3 步之后的方式使用它。

This is how you can work with glove word embedding in google collaboratory.这就是您如何在 google collaboratory 中使用手套词嵌入的方法。 hope it helps.希望能帮助到你。

Answer 2

Its simple, checkout this older post from SO.很简单，从 SO 查看这篇较旧的帖子。

import zipfile
zip_ref = zipfile.ZipFile(path_to_zip_file, 'r')
zip_ref.extractall(directory_to_extract_to)
zip_ref.close()

Answer 3

If you have Google Drive, you can:如果您有 Google 云端硬盘，则可以：

Mount your Google Drive so that it can be used from Colab notebook挂载您的 Google 云端硬盘，以便可以从 Colab 笔记本中使用它
```
from google.colab import drive drive.mount('/content/gdrive')
```
Download glove.6B.zip and extract it to a place of your choice on your Google Drive, for example例如，下载 glove.6B.zip 并将其解压缩到您在 Google Drive 上选择的位置
```
"My Drive/Place/Of/Your/Choice/glove.6B.300d.txt"
```

Open the file directly from your Colab notebook直接从 Colab 笔记本打开文件

with io.open('/content/gdrive/Place/Of/Your/Choice/glove.6B.300d.txt', encoding='utf8') as f:

Answer 4

The top answer is fine.楼上的回答很好。

Just a little addition from myside and it will start working if you get an error.只需从我身边添加一点，如果您遇到错误，它就会开始工作。

import zipfile

如何在 Google colaboratory 上使用 GloVe 词嵌入文件

问题描述

4 个解决方案

解决方案1
25 2018-09-03 10:42:49

1. Download the zip file 1.下载压缩文件

2. Unzip it 2.解压

3. Get the exact path of where the embedding vectors are extracted using 3. 使用获取嵌入向量的确切路径

4. Index the vectors 4. 索引向量

5. Fuse with google - drive 5. 与谷歌融合 - 驱动

6. Save the indexed vectors to google drive for re-use 6. 将索引向量保存到谷歌驱动器以供重复使用

解决方案2
3 已采纳 2018-04-27 10:20:13

解决方案3
1 2018-11-09 14:36:38

解决方案4
0 2022-06-09 09:51:26

如何在 Google colaboratory 上使用 GloVe 词嵌入文件

问题描述

4 个解决方案

解决方案1 25 2018-09-03 10:42:49

1. Download the zip file 1.下载压缩文件

2. Unzip it 2.解压

3. Get the exact path of where the embedding vectors are extracted using 3. 使用获取嵌入向量的确切路径

4. Index the vectors 4. 索引向量

5. Fuse with google - drive 5. 与谷歌融合 - 驱动

6. Save the indexed vectors to google drive for re-use 6. 将索引向量保存到谷歌驱动器以供重复使用

解决方案2 3 已采纳 2018-04-27 10:20:13

解决方案3 1 2018-11-09 14:36:38

解决方案4 0 2022-06-09 09:51:26

解决方案1
25 2018-09-03 10:42:49

解决方案2
3 已采纳 2018-04-27 10:20:13

解决方案3
1 2018-11-09 14:36:38

解决方案4
0 2022-06-09 09:51:26