简体   繁体   中英

Google Drive pretrained model file can't be opened for loading?

I have deployed a demo of a product using Heroku and I would like to load a pretrained fasttext model via a file in my Google Drive. I successfully downloaded the library via my heroku requirements file.

The file I want to use to load it cc.en.300.bin which is given here: https://fasttext.cc/docs/en/crawl-vectors.html But unfortunately they only provide the.bin.gz format. So instead I downloaded the file, unzipped it and uploaded it to my google drive. Then got a link to the file: https://drive.google.com/file/d/1pP_XQy_svafzvbCHUXC3Tc8yUriiex1-/view?usp=sharing .

But when I run this code:

import fasttext
ft = fasttext.load_model('https://drive.google.com/file/d/1pP_XQy_svafzvbCHUXC3Tc8yUriiex1-/view?usp=sharing')

I get following error via heroku logs:

2022-03-03T19:36:24.688000+00:00 app[web.1]: ValueError: https://drive.google.com/file/d/1pP_XQy_svafzvbCHUXC3Tc8yUriiex1-/view?usp=sharing cannot be opened for loading!
2022-03-03T19:36:24.935616+00:00 heroku[web.1]: Process exited with status 1
2022-03-03T19:36:25.071266+00:00 heroku[web.1]: State changed from starting to crashed

I'm not sure why it's not loading, and also I can't really put the file because I think that would be too large (it's like 7 GB in it's.bin format). So I'm not sure what to do.

You need Google API client and GAuth Libraries to load a file remotely from drive.

For this case you can try this:

from urllib.request import urlopen
   
ft = fasttext.load_model(urlopen("https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.bin.gz")) # replace the url if it's wrong

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM