I've done the Count vectorizer with Cosine similarity. Next, I want the Confusion Matrix to get precision and accuracy
But I don't know how to do it I really appreciate your answers even though they are just steps
let me know if it is wrong / lacking in describe the problem
code Count Vectorizer
c_vectorizer = CountVectorizer()
c_vectorized = c_vectorizer.fit_transform(dataset_with_tags.movie_tags)
c_vectorized_m2m = pd.DataFrame(cosine_similarity(c_vectorized))
c_vectorized_m2m.head(10)
c_vectorized_m2m_similarity = c_vectorized_m2m.stack().reset_index()
c_vectorized_m2m_similarity.columns = ['first_movie', 'second_movie', 'similarity_score']
c_vectorized_m2m_similarity.head(10)
You seem to be confused about the confusion matrix : it's used when you can compare actual vs. predicted values for a classification problem , thus giving you an absolute truth (TRUE/FALSE) as to whether or not categories were properly identified. Eg how to generate a confusion matrix from the resultswith a classifier .
https://en.wikipedia.org/wiki/Confusion_matrix
Similarity matrices don't categorize , they just provide you with continuous values from 0 to 1 representing how 2 things are similar. There is no classification, thus you cannot use a confusion matrix .
Whether you want to use a similarity matrix (how similar are 2 items) or a classifier (eg whether a movie is a "comedy" or a "drama", movies can have several genres, eg "romantic comedy", so you will need a multi-class classifier), you need some test data to assess the performance of your model :
movie_tags
from your dataset are accurate, you can use those to train your classifier, and predict tags for movies which are not in your dataset (you can always use a similarity matrix later on to recommend similar movies based on those predicted tags).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.