简体   繁体   English

如何在两列上使用 sklearn TfidfVectorizer fit_transform

[英]How to use sklearn TfidfVectorizer fit_transform on two columns

Not sure if this the correct way to apply fit_transform on both of these columns.不确定这是否是在这两列上应用 fit_transform 的正确方法。 Currently am writing a classifier to predict fraudulent job postings.目前正在编写一个分类器来预测欺诈性职位发布。 I'm interested in 'description' and 'requirements' columns.我对“描述”和“要求”列感兴趣。 I don't know if there is a way to do both transforms in the same line.我不知道是否有办法在同一行中进行两个转换。

preprocessor = TfidfVectorizer(stop_words='english', strip_accents='unicode', norm='l2', use_idf=False,smooth_idf=False)
XX = preprocessor.fit_transform(X["description"])
XX = preprocessor.fit_transform(X["requirements"])

I think that you are misinterpreting the documentation.我认为您误解了文档。 If you want to do tfidf on two columns, then you need to pass two transformers.如果你想在两列上做 tfidf,那么你需要传递两个变压器。 Something like this:像这样的东西:

tfidf_1 = TfidfVectorizer(min_df=0)
tfidf_2 = TfidfVectorizer(min_df=0)
clmn = ColumnTransformer([("tfidf_1", tfidf_1, "a"), 
                          ("tfidf_2", tfidf_2, "b")
                         ],
                         remainder="passthrough")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 fit_transform、transform 和 TfidfVectorizer 的工作原理 - How fit_transform, transform and TfidfVectorizer works 矢量化fit_transform如何在sklearn中工作? - How vectorizer fit_transform work in sklearn? 如何将 sklearn 预处理器 fit_transform 与 pandas.groupby.transform 一起使用 - How to use sklearn preprocessor fit_transform with pandas.groupby.transform 如何将 sklearn fit_transform 与 pandas 一起使用并返回 dataframe 而不是 numpy 数组? - How to use sklearn fit_transform with pandas and return dataframe instead of numpy array? sklearn countvectorizer 中的 fit_transform 和 transform 有什么区别? - What is the difference between fit_transform and transform in sklearn countvectorizer? sklearn中的'transform'和'fit_transform'有什么区别 - what is the difference between 'transform' and 'fit_transform' in sklearn 在 piepline 中使用特征选择和 ML model 时,如何确保 sklearn piepline 应用 fit_transform 方法? - How to be sure that sklearn piepline applies fit_transform method when using feature selection and ML model in piepline? 为什么fit_transform在此sklearn Pipeline示例中不起作用? - Why doesn't fit_transform work in this sklearn Pipeline example? 我们可以直接使用 .fit_transform() 吗? - Can we use .fit_transform() directly? sklearn PCA fit_transform() 是否以输入变量为中心? - Does sklearn PCA fit_transform() center input variables?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM