简体   繁体   中英

How to use sklearn TfidfVectorizer fit_transform on two columns

Not sure if this the correct way to apply fit_transform on both of these columns. Currently am writing a classifier to predict fraudulent job postings. I'm interested in 'description' and 'requirements' columns. I don't know if there is a way to do both transforms in the same line.

preprocessor = TfidfVectorizer(stop_words='english', strip_accents='unicode', norm='l2', use_idf=False,smooth_idf=False)
XX = preprocessor.fit_transform(X["description"])
XX = preprocessor.fit_transform(X["requirements"])

I think that you are misinterpreting the documentation. If you want to do tfidf on two columns, then you need to pass two transformers. Something like this:

tfidf_1 = TfidfVectorizer(min_df=0)
tfidf_2 = TfidfVectorizer(min_df=0)
clmn = ColumnTransformer([("tfidf_1", tfidf_1, "a"), 
                          ("tfidf_2", tfidf_2, "b")
                         ],
                         remainder="passthrough")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM