訓練后如何存儲短語三元組 gensim model

Question

我想知道我可以在對句子進行訓練后存儲 gensim Phrase model

documents = ["the mayor of new york was there", "human computer interaction and 
machine learning has now become a trending research area","human computer interaction 
is interesting","human computer interaction is a pretty interesting subject", "human 
computer interaction is a great and new subject", "machine learning can be useful 
sometimes","new york mayor was present", "I love machine learning because it is a new 
subject area", "human computer interaction helps people to get user friendly 
applications"]

sentences = [doc.split(" ") for doc in documents]

bigram_transformer = Phrases(sentences)
bigram_sentences = bigram_transformer[sentences]
print("Bigrams - done")
# Here we use a phrase model that detects the collocation of 3 words (trigrams).
trigram_transformer = Phrases(bigram_sentences)
trigram_sentences = trigram_transformer[bigram_sentences]
print("Trigrams - done")

如何物理存儲 trigram_transformer 以使用 pickle 再次重用它？

預先感謝您的幫助。

Answer 1

將列表或該部分格式轉換為 numpy 數組並將其保存為易於保存且易於閱讀的.npy 文件，使用 numpy 為您提供了在幾乎所有平台（如 google colab、replit...有關保存 npy 文件numpy.save()的更多詳細信息，請參閱此鏈接

使用 pickle 也是一個不錯的選擇，但是當編碼標准不同以及出現此類問題時，事情會變得有些棘手。

Answer 2

您可以使用 Gensim 的原生.save()方法：

trigram_transformer.save(TRIPHRASER_PATH)

...然后類似地重新加載：

reloads_trigram_transformer = Phrases.load(TRIPHRASER_PATH)

（Gensim 保存/加載方法通常使用 Python 酸洗，但對於某些模型和版本轉換可能會特別處理某些屬性。）

您也可以使用 Python 自己的 pickle，除非/直到您嘗試將太舊的 model 加載到可能改變Phrases model 的較新版本的 Gensim 中，否則它應該可以正常工作。

訓練后如何存儲短語三元組 gensim model

問題描述

2 個解決方案

解決方案1
0 2022-02-03 18:40:22

解決方案2
0 2022-02-04 01:01:51

訓練后如何存儲短語三元組 gensim model

問題描述

2 個解決方案

解決方案1 0 2022-02-03 18:40:22

解決方案2 0 2022-02-04 01:01:51

解決方案1
0 2022-02-03 18:40:22

解決方案2
0 2022-02-04 01:01:51