In NLP using tf-idf how to find the frequency of specific word from the corpus(contaning large numbers of documentation) in python

Question

How to find the frequency of an individual word from the corpus using Tf-idf. Below is my sample code, now I want to print the frequency of a word. How can I achieve this?

from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer()
corpus = ['This is the first document.',
      'This is the second second document.',
      'And the third one.',
      'Is this the first document?',]
X = vectorizer.fit_transform(corpus)
X
print(vectorizer.get_feature_names())
X.toarray()
vectorizer.vocabulary_.get('document')

print(vectorizer.get_feature_names())

X.toarray()

vectorizer.vocabulary_.get('document')

Answer 1

Your vectorizer.vocabulary_ has the count for each word:

print(vectorizer.volcabulary_)

{'this': 8,
 'is': 3,
 'the': 6,
 'first': 2,
 'document': 1,
 'second': 5,
 'and': 0,
 'third': 7,
 'one': 4}

Calculating word frequency is straightforward then:

vocab = vectorizer.vocabulary_
tot = sum(vocab.values())
frequency = {vocab[w]/tot for w in vocab.keys()}

In NLP using tf-idf how to find the frequency of specific word from the corpus(contaning large numbers of documentation) in python

Question

1 answers

solution1
0 2019-04-11 08:28:12

In NLP using tf-idf how to find the frequency of specific word from the corpus(contaning large numbers of documentation) in python

Question

1 answers

solution1 0 2019-04-11 08:28:12

solution1
0 2019-04-11 08:28:12