如何在来自大数据集（csv 文件）的单个列上在 python 中运行 TF-IDF？

Question

I am attempting to create a python program that runs the TF-IDF of a big data set.我正在尝试创建一个运行大数据集的 TF-IDF 的 python 程序。 It has multiple columns and several rows of data.它有多个列和几行数据。 My problem is I don't know how to limit it to only run on one of the columns titled Comments.我的问题是我不知道如何将其限制为仅在标题为 Comments 的列之一上运行。

Answer 1

You can take out the values of required column and run TF-IDF on it:您可以取出所需列的值并在其上运行 TF-IDF：

from sklearn.feature_extraction.text import TfidfVectorizer

doc=df['Comments'].values #df is your dataframe
tf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tf.fit_transform(doc)

Hope it helps.希望能帮助到你。

如何在来自大数据集（csv 文件）的单个列上在 python 中运行 TF-IDF？

问题描述

1 个解决方案

解决方案1
0 2020-02-04 04:25:57

如何在来自大数据集（csv 文件）的单个列上在 python 中运行 TF-IDF？

问题描述

1 个解决方案

解决方案1 0 2020-02-04 04:25:57

解决方案1
0 2020-02-04 04:25:57