简体   繁体   中英

I want to know the Confusion matrix Steps

I've done the Count vectorizer with Cosine similarity. Next, I want the Confusion Matrix to get precision and accuracy

But I don't know how to do it I really appreciate your answers even though they are just steps

let me know if it is wrong / lacking in describe the problem

code Count Vectorizer

    c_vectorizer = CountVectorizer()
    c_vectorized = c_vectorizer.fit_transform(dataset_with_tags.movie_tags)
    c_vectorized_m2m = pd.DataFrame(cosine_similarity(c_vectorized))
    c_vectorized_m2m.head(10)

在此处输入图像描述

    c_vectorized_m2m_similarity = c_vectorized_m2m.stack().reset_index()
    c_vectorized_m2m_similarity.columns = ['first_movie', 'second_movie', 'similarity_score']
    c_vectorized_m2m_similarity.head(10)

在此处输入图像描述

You seem to be confused about the confusion matrix : it's used when you can compare actual vs. predicted values for a classification problem , thus giving you an absolute truth (TRUE/FALSE) as to whether or not categories were properly identified. Eg how to generate a confusion matrix from the resultswith a classifier .

https://en.wikipedia.org/wiki/Confusion_matrix 在此处输入图像描述

Similarity matrices don't categorize , they just provide you with continuous values from 0 to 1 representing how 2 things are similar. There is no classification, thus you cannot use a confusion matrix .

Whether you want to use a similarity matrix (how similar are 2 items) or a classifier (eg whether a movie is a "comedy" or a "drama", movies can have several genres, eg "romantic comedy", so you will need a multi-class classifier), you need some test data to assess the performance of your model :

  • Similarity matrix : list of movies which are similar/dissimilar and expect your matrix to return values close to 1/0 respectively
  • Classifier : assuming the movie_tags from your dataset are accurate, you can use those to train your classifier, and predict tags for movies which are not in your dataset (you can always use a similarity matrix later on to recommend similar movies based on those predicted tags).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM