[英]How to store the Phrase trigrams gensim model after training
我想知道我可以在對句子進行訓練后存儲 gensim Phrase model
documents = ["the mayor of new york was there", "human computer interaction and
machine learning has now become a trending research area","human computer interaction
is interesting","human computer interaction is a pretty interesting subject", "human
computer interaction is a great and new subject", "machine learning can be useful
sometimes","new york mayor was present", "I love machine learning because it is a new
subject area", "human computer interaction helps people to get user friendly
applications"]
sentences = [doc.split(" ") for doc in documents]
bigram_transformer = Phrases(sentences)
bigram_sentences = bigram_transformer[sentences]
print("Bigrams - done")
# Here we use a phrase model that detects the collocation of 3 words (trigrams).
trigram_transformer = Phrases(bigram_sentences)
trigram_sentences = trigram_transformer[bigram_sentences]
print("Trigrams - done")
如何物理存儲 trigram_transformer 以使用 pickle 再次重用它?
預先感謝您的幫助。
將列表或該部分格式轉換為 numpy 數組並將其保存為易於保存且易於閱讀的.npy 文件,使用 numpy 為您提供了在幾乎所有平台(如 google colab、replit...有關保存 npy 文件numpy.save()的更多詳細信息,請參閱此鏈接
使用 pickle 也是一個不錯的選擇,但是當編碼標准不同以及出現此類問題時,事情會變得有些棘手。
您可以使用 Gensim 的原生.save()
方法:
trigram_transformer.save(TRIPHRASER_PATH)
...然后類似地重新加載:
reloads_trigram_transformer = Phrases.load(TRIPHRASER_PATH)
(Gensim 保存/加載方法通常使用 Python 酸洗,但對於某些模型和版本轉換可能會特別處理某些屬性。)
您也可以使用 Python 自己的 pickle,除非/直到您嘗試將太舊的 model 加載到可能改變Phrases
model 的較新版本的 Gensim 中,否則它應該可以正常工作。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.