简体   繁体   English

如何在来自大数据集(csv 文件)的单个列上在 python 中运行 TF-IDF?

[英]How do i run a TF-IDF in python on a single column from a big data set (csv file)?

I am attempting to create a python program that runs the TF-IDF of a big data set.我正在尝试创建一个运行大数据集的 TF-IDF 的 python 程序。 It has multiple columns and several rows of data.它有多个列和几行数据。 My problem is I don't know how to limit it to only run on one of the columns titled Comments.我的问题是我不知道如何将其限制为仅在标题为 Comments 的列之一上运行。

You can take out the values of required column and run TF-IDF on it:您可以取出所需列的值并在其上运行 TF-IDF:

from sklearn.feature_extraction.text import TfidfVectorizer

doc=df['Comments'].values #df is your dataframe
tf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tf.fit_transform(doc)

Hope it helps.希望能帮助到你。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM