Add tf-idf values as columns in a matrix

Question

from sklearn.feature_extraction.text import TfidfVectorizer

item = list(df['item1']) + list(df['item2'])
tfidf = TfidfVectorizer()
tfidf_sp = tfidf.fit_transform(item)

for i in len(list(df['item1'])):
    new_list =[]
    new_list.append(tfidf.idf_)
df['updated_item'] = list(new_list)

I was trying to add the tfidf scores as features. Is it the correct way?

item1 is of shape (400k) and same is the shape of item2. The shape of tfidf_sp is (800k, 100k).

Answer 1

import pandas as pd

pd.DataFrame(tfidf_sp, columns = tfidf.get_feature_names())

This will give you a matrix with the columns as the tfidf vocabulary and each row containing tfidf values corresponding to each item.

Hope this helps.

Edit:

Try converting the term-document matrix obtained into an array as follows:

tfidf_sp = tfidf.fit_transform(item).toarray()

This will solve the Pandas Error.

Add tf-idf values as columns in a matrix

Question

1 answers

solution1
0 2018-06-22 13:11:00

Add tf-idf values as columns in a matrix

Question

1 answers

solution1 0 2018-06-22 13:11:00

solution1
0 2018-06-22 13:11:00