简体   繁体   中英

Calculate tf-idf in Gensim for my vocabulary

I have a set of words (n-grams) where I need to calculate tf-idf values. These words are;

myvocabulary = ['tim tam', 'jam', 'fresh milk', 'chocolates', 'biscuit pudding']

My corpus looks as follows.

corpus = {1: "making chocolates biscuit pudding easy first get your favourite biscuit chocolates", 2: "tim tam drink new recipe that yummy and tasty more thicker than typical milkshake that uses normal chocolates", 3: "making chocolates drink different way using fresh milk egg"}

I am currently getting tf-idf values for my n-grams in myvocabulary using sklearn as follows.

tfidf = TfidfVectorizer(vocabulary = myvocabulary, ngram_range = (1,3))
tfs = tfidf.fit_transform(corpus.values())

However, I am interested in doing the same in Gensim. Forall the examples I came across in Gensim;

  1. uses only unigrams ( iwant it for bigrams and trigrams as well)
  2. calculated for all the words (I only want to calculate for the words in myvocabulary )

Hence, please help me to find out how to do the above two things in Gensim.

In gensim, for a dictionary, you should use gensim.corpora.Dictionary class, look at examples

Unfortunately, we have no support ngrams in general, only bigrams for words with Phrases class

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM