I'm trying to finding a way to know the accuracy of my Recommender System. The method that I used was to create a KNN model based on a User X Movies matrix (where the content are the ratings that a given user gave to a given movie). Based on that model, I have a function, where I can input a movie title and it returns to me the K more similar movies to the one I used as input. Having that, I don't know how to measure if my model is accurate and if the movies shown are really similar to the one I used as input. Any ideas?
Here is a sample of the dataset I'm using
def create_sparse_matrix(df):
sparse_matrix = sparse.csr_matrix((df["rating"], (df["userId"], df["movieId"])))
return sparse_matrix
# getting the transpose - data_cf is the dataFrame name that I'm using
user_movie_matrix = create_sparse_matrix(data_cf).transpose()
knn_cf = NearestNeighbors(n_neighbors=N_NEIGHBORS, algorithm='auto', metric='cosine')
knn_cf.fit(user_movie_matrix)
# Creating function to get movies recommendations based in a movie input.
def get_recommendations_cf(movie_name, model):
# Getting the ID of the movie based on it's title
movieId = data_cf.loc[data_cf["title"] == movie_name]["movieId"].values[0]
distances, suggestions = model.kneighbors(user_movie_matrix.getrow(movieId).todense().tolist(), n_neighbors=10)
for i in range(0, len(distances.flatten())):
if(i == 0):
print('Recomendações para {0}: \n'.format(movie_name))
else:
print('{0}: {1}, com distância de {2}:'.format(i, data_cf.loc[data_cf["movieId"] == suggestions.flatten()[i]]["title"].values[0], distances.flatten()[i]))
return distances, suggestions
Calling the recommender function and showing the "distance" of each movie recommended
Translating:
"Recomendações para Spider-Man 2: " = "Recommendations for Spider-Man 2: "
"1: Spider-Man, com distância de 0.30051949781903664" = "1: Spider-Man, with distance of 0.30051949781903664"
...
"9: Finding Nemo, com distância de 0.4844064554284505:" = "9: Finding Nemo, with distance of 0.4844064554284505:"
When it comes to recommendation systems, measuring performance is never a straightforward task. That is because there are many desirable characteristics that we are looking for in a recommendation: accuracy, diversity, novelty, ... All of which can be measured in some way or another. There are many very helpful articles on the web that cover the topic. I will link a few references that deal with precision in specific:
Bear in mind that to do any sort of evaluation you need to split your data into a train and a test set. In the case of recommender systems, since all users and all items must be represented in both the train and test sets, you must use a stratified approach. That means that you should take set aside a percentage of the movie reviews for each user instead of simply sampling lines of your dataset.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.