无法从发布数据加载预训练的 gensim Doc2Vec

Question

I want to use an already trained Doc2Vec from a published paper.我想使用已发表论文中经过培训的 Doc2Vec。

Paper纸

Whalen, R., Lungeanu, A., DeChurch, L., & Contractor, N. (2020). Whalen, R., Lungeanu, A., DeChurch, L. 和 Contractor, N. (2020)。 Patent Similarity Data and Innovation Metrics.专利相似性数据和创新指标。 Journal of Empirical Legal Studies, 17(3), 615–639.实证法律研究杂志，17（3），615-639。 https://doi.org/10.1111/jels.12261 https://doi.org/10.1111/jels.12261

Code代码

https://github.com/ryanwhalen/patent_similarity_data https://github.com/ryanwhalen/patent_similarity_data

Data数据

https://zenodo.org/record/3552078#.YeWkFvgxmUk https://zenodo.org/record/3552078#.YeWkFvgxmUk

However, when trying to load the model (patent_doc2v_10e.model) an error is raised.但是，当尝试加载 model (patent_doc2v_10e.model) 时会出现错误。 Edit : The file can be downloaded from the data repository (link above).编辑：该文件可以从数据存储库（上面的链接）下载。 I am not the author of the paper nor the creator of the model.我不是论文的作者，也不是 model 的创建者。

from gensim.models.doc2vec import Doc2Vec
model = Doc2Vec.load("patent_doc2v_10e.model")


FileNotFoundError: [Errno 2] No such file or directory: 'patent_doc2v_10e.model.trainables.syn1neg.npy'

Am I missing files or do I have to load the model in other ways?我是缺少文件还是必须以其他方式加载 model？

Answer 1

Where did the file patent_doc2v_10e.model come from?文件patent_doc2v_10e.model来自哪里？

If trying to load that file, it generates such an error about another file with the name patent_doc2v_10e.model.trainables.syn1neg.npy , then that other file is a necessary part of the full model that should have been created alongside patent_doc2v_10e.model when that patent_doc2v_10e.model file was first .save() -persisted to disk. If trying to load that file, it generates such an error about another file with the name patent_doc2v_10e.model.trainables.syn1neg.npy , then that other file is a necessary part of the full model that should have been created alongside patent_doc2v_10e.model when该patent_doc2v_10e.model文件首先是.save() -持久化到磁盘。

You'll need to go back to where patent_doc2v_10e.model was created, & find the extra missing patent_doc2v_10e.model.trainables.syn1neg.npy file (& possibly others also starting patent_doc2v_10e.model… ). You'll need to go back to where patent_doc2v_10e.model was created, & find the extra missing patent_doc2v_10e.model.trainables.syn1neg.npy file (& possibly others also starting patent_doc2v_10e.model… ). All such files created at the same .save() must be kept/moved together, at the same filesystem path, for any future .load() to succeed.在相同的.save()中创建的所有此类文件必须一起保存/移动，在相同的文件系统路径中，以便将来的任何.load()成功。

(Additionally, if you are training these yourself from original data, I'd suggest being sure to use a current version of Gensim. Only older pre-4.0 versions will create any save files with trainables in the name.) （此外，如果您自己使用原始数据进行训练，我建议确保使用当前版本的 Gensim。只有较旧的 4.0 之前的版本会创建名称中包含trainables对象的任何保存文件。）

无法从发布数据加载预训练的 gensim Doc2Vec

问题描述

1 个解决方案

解决方案1
0 2022-01-17 18:37:19

无法从发布数据加载预训练的 gensim Doc2Vec

问题描述

1 个解决方案

解决方案1 0 2022-01-17 18:37:19

解决方案1
0 2022-01-17 18:37:19