Python中的TF-IDF矩阵

Question

My code to calculate TF-IDF for a corpus goes like this: 我的计算语料库TF-IDF代码如下：

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer

train_set = "i have a ball", "he is good", "she played well" 
vectorizer = TfidfVectorizer(min_df=1)

train_array = vectorizer.fit_transform(train_set).toarray()
print(vectorizer.get_feature_names())
print(train_array)

The output I receive is: 我收到的输出是：

['ball', 'good', 'have', 'he', 'is', 'played', 'she', 'well']

[[0.70710678, 0., 0.70710678, 0., 0., 0., 0., 0.],
 [0., 0.57735027, 0., 0.57735027, 0.57735027, 0., 0., 0.],
 [0., 0., 0., 0., 0., 0.57735027, 0.57735027, 0.57735027]]

The question is how can I calculate TF-IDF of the sentence: "she is good" ? 问题是如何计算句子"she is good" TF-IDF ？ The corpus is the train_set in the above code. 语料库是上面代码中的train_set 。

Answer 1

You simply apply your TF-IDF vectorizer on new data with .transform method: 您只需使用.transform方法将TF-IDF矢量化器应用于新数据：

In [16]: test = ["she is good"]

In [17]: test_array = vectorizer.transform(test)

In [18]: test_array.A
Out[18]: array([[0., 0.57735027, 0., 0., 0.57735027, 0., 0.57735027, 0.]])

In [19]: vectorizer.get_feature_names()
Out[19]: ['ball', 'good', 'have', 'he', 'is', 'played', 'she', 'well']

Python中的TF-IDF矩阵

问题描述

1 个解决方案

解决方案1
3 2017-08-13 20:39:36

Python中的TF-IDF矩阵

问题描述

1 个解决方案

解决方案1 3 2017-08-13 20:39:36

解决方案1
3 2017-08-13 20:39:36