[英]How do i run a TF-IDF in python on a single column from a big data set (csv file)?
I am attempting to create a python program that runs the TF-IDF of a big data set.我正在尝试创建一个运行大数据集的 TF-IDF 的 python 程序。 It has multiple columns and several rows of data.
它有多个列和几行数据。 My problem is I don't know how to limit it to only run on one of the columns titled Comments.
我的问题是我不知道如何将其限制为仅在标题为 Comments 的列之一上运行。
You can take out the values of required column and run TF-IDF on it:您可以取出所需列的值并在其上运行 TF-IDF:
from sklearn.feature_extraction.text import TfidfVectorizer
doc=df['Comments'].values #df is your dataframe
tf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tf.fit_transform(doc)
Hope it helps.希望能帮助到你。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.