無法從發布數據加載預訓練的 gensim Doc2Vec

Question

我想使用已發表論文中經過培訓的 Doc2Vec。

紙

Whalen, R., Lungeanu, A., DeChurch, L. 和 Contractor, N. (2020)。 專利相似性數據和創新指標。 實證法律研究雜志，17（3），615-639。 https://doi.org/10.1111/jels.12261

代碼

https://github.com/ryanwhalen/patent_similarity_data

數據

https://zenodo.org/record/3552078#.YeWkFvgxmUk

但是，當嘗試加載 model (patent_doc2v_10e.model) 時會出現錯誤。 編輯：該文件可以從數據存儲庫（上面的鏈接）下載。 我不是論文的作者，也不是 model 的創建者。

from gensim.models.doc2vec import Doc2Vec
model = Doc2Vec.load("patent_doc2v_10e.model")


FileNotFoundError: [Errno 2] No such file or directory: 'patent_doc2v_10e.model.trainables.syn1neg.npy'

我是缺少文件還是必須以其他方式加載 model？

Answer 1

文件patent_doc2v_10e.model來自哪里？

If trying to load that file, it generates such an error about another file with the name patent_doc2v_10e.model.trainables.syn1neg.npy , then that other file is a necessary part of the full model that should have been created alongside patent_doc2v_10e.model when該patent_doc2v_10e.model文件首先是.save() -持久化到磁盤。

You'll need to go back to where patent_doc2v_10e.model was created, & find the extra missing patent_doc2v_10e.model.trainables.syn1neg.npy file (& possibly others also starting patent_doc2v_10e.model… ). 在相同的.save()中創建的所有此類文件必須一起保存/移動，在相同的文件系統路徑中，以便將來的任何.load()成功。

（此外，如果您自己使用原始數據進行訓練，我建議確保使用當前版本的 Gensim。只有較舊的 4.0 之前的版本會創建名稱中包含trainables對象的任何保存文件。）

無法從發布數據加載預訓練的 gensim Doc2Vec

問題描述

1 個解決方案

解決方案1
0 2022-01-17 18:37:19

無法從發布數據加載預訓練的 gensim Doc2Vec

問題描述

1 個解決方案

解決方案1 0 2022-01-17 18:37:19

解決方案1
0 2022-01-17 18:37:19