简体   繁体   中英

name define on TF-IDF calculation

I have a dataset contains a set of article papers. I merged the metadata and the json files, and created a dataframe. Here is my code:

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

import pandas as pd
import numpy as np 

vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(merged_df['Title'][39100])

print(X.shape)

query = "How to prevent covid19"
query_vec = vectorize.transform([query])
result = cosine_similarity(X,query_vec).reshape((-1,))

for i in result.argsort()[-10:][::-1]:
    print(merged_df.iloc['Title'][i,0], "--", merged_df.iloc['Title'][i,1])

I want to calculate Title's TFIDF to handle the query, that helps me to find some relevant papers. Why it prompts name "merged_df" is not defined?

Within your code merged_df is nowhere defined. The dataframe is never created, therefore undefined.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM