[英]How to use sklearn TfidfVectorizer fit_transform on two columns
Not sure if this the correct way to apply fit_transform on both of these columns.不确定这是否是在这两列上应用 fit_transform 的正确方法。 Currently am writing a classifier to predict fraudulent job postings.目前正在编写一个分类器来预测欺诈性职位发布。 I'm interested in 'description' and 'requirements' columns.我对“描述”和“要求”列感兴趣。 I don't know if there is a way to do both transforms in the same line.我不知道是否有办法在同一行中进行两个转换。
preprocessor = TfidfVectorizer(stop_words='english', strip_accents='unicode', norm='l2', use_idf=False,smooth_idf=False)
XX = preprocessor.fit_transform(X["description"])
XX = preprocessor.fit_transform(X["requirements"])
I think that you are misinterpreting the documentation.我认为您误解了文档。 If you want to do tfidf on two columns, then you need to pass two transformers.如果你想在两列上做 tfidf,那么你需要传递两个变压器。 Something like this:像这样的东西:
tfidf_1 = TfidfVectorizer(min_df=0)
tfidf_2 = TfidfVectorizer(min_df=0)
clmn = ColumnTransformer([("tfidf_1", tfidf_1, "a"),
("tfidf_2", tfidf_2, "b")
],
remainder="passthrough")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.