[英]How to use GloVe word-embeddings file on Google colaboratory
I have downloaded the data with wget我已经用wget下载了数据
!wget http://nlp.stanford.edu/data/glove.6B.zip
- ‘glove.6B.zip’ saved [862182613/862182613]
It is saved as zip and I would like to use glove.6B.300d.txt file from the zip file.它保存为 zip,我想使用 zip 文件中的 glove.6B.300d.txt 文件。 What I want to achieve is:
我想要实现的是:
embeddings_index = {}
with io.open('glove.6B.300d.txt', encoding='utf8') as f:
for line in f:
values = line.split()
word = values[0]
coefs = np.asarray(values[1:],dtype='float32')
embeddings_index[word] = coefs
Of course I am having this error:当然我有这个错误:
IOErrorTraceback (most recent call last)
<ipython-input-47-d07cafc85c1c> in <module>()
1 embeddings_index = {}
----> 2 with io.open('glove.6B.300d.txt', encoding='utf8') as f:
3 for line in f:
4 values = line.split()
5 word = values[0]
IOError: [Errno 2] No such file or directory: 'glove.6B.300d.txt'
How can I unzip and use that file in my code above on Google colab?我如何在 Google colab 上面的代码中解压缩并使用该文件?
One more way you could do is as follows.您可以做的另一种方法如下。
!wget http://nlp.stanford.edu/data/glove.6B.zip
post downloading the zip file it is saved in the /content directory of google Collab.下载 zip 文件后,它会保存在 google Collab 的 /content 目录中。
!unzip glove*.zip
!ls
!pwd
print('Indexing word vectors.')
embeddings_index = {}
f = open('glove.6B.100d.txt', encoding='utf-8')
for line in f:
values = line.split()
word = values[0]
coefs = np.asarray(values[1:], dtype='float32')
embeddings_index[word] = coefs
f.close()
print('Found %s word vectors.' % len(embeddings_index))
!pip install --upgrade pip
!pip install -U -q pydrive
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
from google.colab import auth
auth.authenticate_user()
# Generate creds for the Drive FUSE library.
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}
!mkdir -p drive
!google-drive-ocamlfuse drive
import pickle
pickle.dump({'embeddings_index' : embeddings_index } , open('drive/path/to/your/file/location', 'wb'))
If you have already downloaded the zip file in the local system, just extract it and upload the required dimension file to google drive -> fuse gdrive -> give the appropriate path and then use it / make an index of it, etc.如果您已经在本地系统中下载了 zip 文件,只需将其解压缩并将所需的维度文件上传到谷歌驱动器 -> 融合 gdrive -> 给出适当的路径,然后使用它/制作它的索引等。
also, another way would be if already downloaded in the local system via code in collab另外,另一种方法是如果已经通过 collab 中的代码下载到本地系统中
from google.colab import files
files.upload()
select the file and use it as in step 3 onwards.选择文件并按照第 3 步之后的方式使用它。
This is how you can work with glove word embedding in google collaboratory.这就是您如何在 google collaboratory 中使用手套词嵌入的方法。 hope it helps.
希望能帮助到你。
Its simple, checkout this older post from SO.很简单,从 SO 查看这篇较旧的帖子。
import zipfile
zip_ref = zipfile.ZipFile(path_to_zip_file, 'r')
zip_ref.extractall(directory_to_extract_to)
zip_ref.close()
If you have Google Drive, you can:如果您有 Google 云端硬盘,则可以:
Mount your Google Drive so that it can be used from Colab notebook挂载您的 Google 云端硬盘,以便可以从 Colab 笔记本中使用它
from google.colab import drive drive.mount('/content/gdrive')
Download glove.6B.zip and extract it to a place of your choice on your Google Drive, for example例如,下载 glove.6B.zip 并将其解压缩到您在 Google Drive 上选择的位置
"My Drive/Place/Of/Your/Choice/glove.6B.300d.txt"
Open the file directly from your Colab notebook直接从 Colab 笔记本打开文件
with io.open('/content/gdrive/Place/Of/Your/Choice/glove.6B.300d.txt', encoding='utf8') as f:
The top answer is fine.楼上的回答很好。
Just a little addition from myside and it will start working if you get an error.只需从我身边添加一点,如果您遇到错误,它就会开始工作。
import zipfile
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.