簡體   English   中英

訓練后如何存儲短語三元組 gensim model

[英]How to store the Phrase trigrams gensim model after training

我想知道我可以在對句子進行訓練后存儲 gensim Phrase model

documents = ["the mayor of new york was there", "human computer interaction and 
machine learning has now become a trending research area","human computer interaction 
is interesting","human computer interaction is a pretty interesting subject", "human 
computer interaction is a great and new subject", "machine learning can be useful 
sometimes","new york mayor was present", "I love machine learning because it is a new 
subject area", "human computer interaction helps people to get user friendly 
applications"]

sentences = [doc.split(" ") for doc in documents]

bigram_transformer = Phrases(sentences)
bigram_sentences = bigram_transformer[sentences]
print("Bigrams - done")
# Here we use a phrase model that detects the collocation of 3 words (trigrams).
trigram_transformer = Phrases(bigram_sentences)
trigram_sentences = trigram_transformer[bigram_sentences]
print("Trigrams - done")

如何物理存儲 trigram_transformer 以使用 pickle 再次重用它?

預先感謝您的幫助。

將列表或該部分格式轉換為 numpy 數組並將其保存為易於保存且易於閱讀的.npy 文件,使用 numpy 為您提供了在幾乎所有平台(如 google colab、replit...有關保存 npy 文件numpy.save()的更多詳細信息,請參閱此鏈接

使用 pickle 也是一個不錯的選擇,但是當編碼標准不同以及出現此類問題時,事情會變得有些棘手。

您可以使用 Gensim 的原生.save()方法:

trigram_transformer.save(TRIPHRASER_PATH)

...然后類似地重新加載:

reloads_trigram_transformer = Phrases.load(TRIPHRASER_PATH)

(Gensim 保存/加載方法通常使用 Python 酸洗,但對於某些模型和版本轉換可能會特別處理某些屬性。)

您也可以使用 Python 自己的 pickle,除非/直到您嘗試將太舊的 model 加載到可能改變Phrases model 的較新版本的 Gensim 中,否則它應該可以正常工作。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM