简体   繁体   English

如何在 Google colaboratory 上使用 GloVe 词嵌入文件

[英]How to use GloVe word-embeddings file on Google colaboratory

I have downloaded the data with wget我已经用wget下载了数据

!wget http://nlp.stanford.edu/data/glove.6B.zip
 - ‘glove.6B.zip’ saved [862182613/862182613]

It is saved as zip and I would like to use glove.6B.300d.txt file from the zip file.它保存为 zip,我想使用 zip 文件中的 glove.6B.300d.txt 文件。 What I want to achieve is:我想要实现的是:

embeddings_index = {}
with io.open('glove.6B.300d.txt', encoding='utf8') as f:
    for line in f:
        values = line.split()
        word = values[0]
        coefs = np.asarray(values[1:],dtype='float32')
        embeddings_index[word] = coefs

Of course I am having this error:当然我有这个错误:

IOErrorTraceback (most recent call last)
<ipython-input-47-d07cafc85c1c> in <module>()
      1 embeddings_index = {}
----> 2 with io.open('glove.6B.300d.txt', encoding='utf8') as f:
      3     for line in f:
      4         values = line.split()
      5         word = values[0]

IOError: [Errno 2] No such file or directory: 'glove.6B.300d.txt'

How can I unzip and use that file in my code above on Google colab?我如何在 Google colab 上面的代码中解压缩并使用该文件?

One more way you could do is as follows.您可以做的另一种方法如下。

1. Download the zip file 1.下载压缩文件

!wget http://nlp.stanford.edu/data/glove.6B.zip

post downloading the zip file it is saved in the /content directory of google Collab.下载 zip 文件后,它会保存在 google Collab 的 /content 目录中。

2. Unzip it 2.解压

!unzip glove*.zip

3. Get the exact path of where the embedding vectors are extracted using 3. 使用获取嵌入向量的确切路径

!ls
!pwd

4. Index the vectors 4. 索引向量

print('Indexing word vectors.')

embeddings_index = {}
f = open('glove.6B.100d.txt', encoding='utf-8')
for line in f:
    values = line.split()
    word = values[0]
    coefs = np.asarray(values[1:], dtype='float32')
    embeddings_index[word] = coefs
f.close()

print('Found %s word vectors.' % len(embeddings_index))

5. Fuse with google - drive 5. 与谷歌融合 - 驱动

!pip install --upgrade pip
!pip install -U -q pydrive
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null

!apt-get -y install -qq google-drive-ocamlfuse fuse

from google.colab import auth
auth.authenticate_user()
# Generate creds for the Drive FUSE library.
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}

!mkdir -p drive
!google-drive-ocamlfuse drive

6. Save the indexed vectors to google drive for re-use 6. 将索引向量保存到谷歌驱动器以供重复使用

import pickle
pickle.dump({'embeddings_index' : embeddings_index } , open('drive/path/to/your/file/location', 'wb'))

If you have already downloaded the zip file in the local system, just extract it and upload the required dimension file to google drive -> fuse gdrive -> give the appropriate path and then use it / make an index of it, etc.如果您已经在本地系统中下载了 zip 文件,只需将其解压缩并将所需的维度文件上传到谷歌驱动器 -> 融合 gdrive -> 给出适当的路径,然后使用它/制作它的索引等。

also, another way would be if already downloaded in the local system via code in collab另外,另一种方法是如果已经通过 collab 中的代码下载到本地系统中

from google.colab import files
files.upload()

select the file and use it as in step 3 onwards.选择文件并按照第 3 步之后的方式使用它。

This is how you can work with glove word embedding in google collaboratory.这就是您如何在 google collaboratory 中使用手套词嵌入的方法。 hope it helps.希望能帮助到你。

Its simple, checkout this older post from SO.很简单,从 SO 查看这篇较旧的帖子

import zipfile
zip_ref = zipfile.ZipFile(path_to_zip_file, 'r')
zip_ref.extractall(directory_to_extract_to)
zip_ref.close()

If you have Google Drive, you can:如果您有 Google 云端硬盘,则可以:

  1. Mount your Google Drive so that it can be used from Colab notebook挂载您的 Google 云端硬盘,以便可以从 Colab 笔记本中使用它

    from google.colab import drive drive.mount('/content/gdrive')
  2. Download glove.6B.zip and extract it to a place of your choice on your Google Drive, for example例如,下载 glove.6B.zip 并将其解压缩到您在 Google Drive 上选择的位置

    "My Drive/Place/Of/Your/Choice/glove.6B.300d.txt"
  3. Open the file directly from your Colab notebook直接从 Colab 笔记本打开文件

    with io.open('/content/gdrive/Place/Of/Your/Choice/glove.6B.300d.txt', encoding='utf8') as f:

The top answer is fine.楼上的回答很好。

Just a little addition from myside and it will start working if you get an error.只需从我身边添加一点,如果您遇到错误,它就会开始工作。

import zipfile

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何创建手套嵌入? - How to create a glove embeddings? 如何使用词嵌入(即 Word2vec、GloVe 或 BERT)来计算一组 N 个词中最大的词相似度? - How to use word embeddings (i.e., Word2vec, GloVe or BERT) to calculate the most word similarity in a set of N words? 如何从手套预训练词嵌入中查询? - How to query from Glove pre-trained word embeddings? IMDB数据集预处理不适合GLoVe字嵌入? - IMDB dataset preprocessing unsuitable for GLoVe word embeddings? 如何将 csv 文件(并使用它)从谷歌驱动器上传到谷歌合作实验室 - How to upload csv file (and use it) from google drive into google colaboratory 如何在 Colaboratory Google 上使用 Selenium? - How to use Selenium on Colaboratory Google? 如何使用 Colaboratory 复制 google 文件? - How to copy a google file with Colaboratory? 使用手套中的训练数据为您的数据集获取单词嵌入 - Getting word embeddings for your dataset using training data in glove 如何在 Google Colaboratory Python Notebook 中使用 Flask? - How to use Flask in Google Colaboratory Python Notebook? 如何使用bert嵌入来训练神经网络模型而不是像手套/ fasttext那样的静态嵌入? - How to train a neural network model with bert embeddings instead of static embeddings like glove/fasttext?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM