簡體   English   中英

如何將預訓練的fastText向量轉換為gensim模型

[英]How to convert pretrained fastText vectors to gensim model

如何將預訓練的fastText向量轉換為gensim模型? 我需要predict_output_word方法。

從gensim.models導入gensim從gensim.models.wrappers導入Word2Vec導入FastText

model_wiki = gensim.models.KeyedVectors.load_word2vec_format(“ wiki.ru.vec”)model3 = Word2Vec(句子= model_wiki)

----> 1中的TypeError Traceback(最近一次調用,最近一次調用)1 model3 = Word2Vec(sentences = model_wiki)#從語料庫訓練模型

〜/ anaconda3 / ENVS /平陽霉素/ lib中/ python3.6 /站點包/ gensim /模型/ word2vec.py中的init(個體經營,句子,corpus_file,大小,α,窗口,min_count,max_vocab_size,樣品,種子,工人, min_alpha,sg,hs,負數,ns_exponent,cbow_mean,hashfxn,iter,null_word,trim_rule,sorted_vocab,batch_words,compute_loss,callbacks,max_final_vocab)765回調=回調,batch_words = batch_words,trim_rule = trim,trim = rule = trim ,窗口=窗口,766種子=種子,hs = hs,負=負,cbow_mean = cbow_mean,min_alpha = min_alpha,compute_loss = compute_loss,-> 767 fast_version = FAST_VERSION)768769 def _do_train_epoch(self,corpus_file,thread_id, ,cython_vocab,thread_private_mem,cur_epoch,

〜/ anaconda3 / ENVS /平陽霉素/ lib中/ python3.6 /站點包/ gensim /模型/ base_any2vec.py中的init(個體經營,句子,corpus_file,工人,vector_size,時代,回調,batch_words,trim_rule,SG,α,窗口,種子,hs,負數,ns_exponent,cbow_mean,min_alpha,compute_loss,fast_version,** kwargs)757提高TypeError(“您不能將生成器作為句子參數傳遞。請嘗試使用迭代器。”)758-> 759 self.build_vocab(句子=句子,corpus_file = corpus_file,trim_rule = trim_rule)760 self.train(761句子=句子,corpus_file = corpus_file,total_examples = self.corpus_count,

〜/ anaconda3 / envs / pym / lib / python3.6 / site-packages / gensim / models / base_any2vec.py在build_vocab中(自己,句子,語料庫文件,更新,progress_per,keep_raw_vocab,trim_rule,** kwargs)934“” 935 total_words,corpus_count = self.vocabulary.scan_vocab(-> 936句子=句子,corpus_file = corpus_file,progress_per = progress_per,trim_rule = trim_rule)937 self.corpus_count = corpus_count 938 self.corpus_total_words = total_

〜/ anaconda3 / envs / pym / lib / python3.6 / site-packages / gensim / models / word2vec.py在scan_vocab中(自己,句子,corpus_file,progress_per,worker,trim_rule)1569句子= LineSentence(corpus_file)
1570-> 1571 total_words,corpus_count = self._scan_vocab(句子,progress_per,trim_rule)1572 1573 logger.info(

〜/ anaconda3 / envs / pym / lib / python3.6 / site-packages / gensim / models / word2vec.py in _scan_vocab(自己,句子,progress_per,trim_rule)1538
vocab = defaultdict(int)1539 Checked_string_types = 0-> 1540 for句子_否,枚舉(句子)中的句子:1541,如果未選中,則字符串_類型:1542
如果isinstance(sentence,string_types):

〜/ anaconda3 / envs / pym / lib / python3.6 / site-packages / gensim / models / keyedvectors.py in getitem (self,entities)337返回self.get_vector(entities)338-> 339 return vstack([self實體中實體的.get_vector(entity)])340 341 def 包含 (自身,實體):

TypeError:“ int”對象不可迭代

根據Gensim文檔,您可以使用gensim.models.wrappers函數執行以下操作:

從Facebook的本地fasttext .bin和.vec輸出文件加載隱藏輸入的權重矩陣

這是一個例子:

from gensim.models.wrappers import FastText

model = FastText.load_fasttext_format('wiki.vec')

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM