How to find the frequency of an individual word from the corpus using Tf-idf. Below is my sample code, now I want to print the frequency of a word. How can I achieve this?
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()
corpus = ['This is the first document.',
'This is the second second document.',
'And the third one.',
'Is this the first document?',]
X = vectorizer.fit_transform(corpus)
X
print(vectorizer.get_feature_names())
X.toarray()
vectorizer.vocabulary_.get('document')
print(vectorizer.get_feature_names())
X.toarray()
vectorizer.vocabulary_.get('document')
Your vectorizer.vocabulary_
has the count for each word:
print(vectorizer.volcabulary_)
{'this': 8,
'is': 3,
'the': 6,
'first': 2,
'document': 1,
'second': 5,
'and': 0,
'third': 7,
'one': 4}
Calculating word frequency is straightforward then:
vocab = vectorizer.vocabulary_
tot = sum(vocab.values())
frequency = {vocab[w]/tot for w in vocab.keys()}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.