简体   繁体   中英

Collaborative Filtering Item-Based Recommender System Accuracy

I'm trying to finding a way to know the accuracy of my Recommender System. The method that I used was to create a KNN model based on a User X Movies matrix (where the content are the ratings that a given user gave to a given movie). Based on that model, I have a function, where I can input a movie title and it returns to me the K more similar movies to the one I used as input. Having that, I don't know how to measure if my model is accurate and if the movies shown are really similar to the one I used as input. Any ideas?

Here is a sample of the dataset I'm using

这是我正在使用的数据集示例

def create_sparse_matrix(df):
    sparse_matrix = sparse.csr_matrix((df["rating"], (df["userId"], df["movieId"])))
    
    return sparse_matrix

# getting the transpose - data_cf is the dataFrame name that I'm using
user_movie_matrix = create_sparse_matrix(data_cf).transpose() 

knn_cf = NearestNeighbors(n_neighbors=N_NEIGHBORS, algorithm='auto', metric='cosine')

knn_cf.fit(user_movie_matrix)
# Creating function to get movies recommendations based in a movie input.
def get_recommendations_cf(movie_name, model): 
    # Getting the ID of the movie based on it's title
    movieId = data_cf.loc[data_cf["title"] == movie_name]["movieId"].values[0]
    
    distances, suggestions = model.kneighbors(user_movie_matrix.getrow(movieId).todense().tolist(), n_neighbors=10)
    
    for i in range(0, len(distances.flatten())):
        if(i == 0):
            print('Recomendações para {0}: \n'.format(movie_name))
        else:
            print('{0}: {1}, com distância de {2}:'.format(i, data_cf.loc[data_cf["movieId"] == suggestions.flatten()[i]]["title"].values[0], distances.flatten()[i]))
    
    return distances, suggestions

Calling the recommender function and showing the "distance" of each movie recommended 调用推荐函数,显示每部电影推荐的“距离”

Translating:

"Recomendações para Spider-Man 2: " = "Recommendations for Spider-Man 2: "

"1: Spider-Man, com distância de 0.30051949781903664" = "1: Spider-Man, with distance of 0.30051949781903664"

...

"9: Finding Nemo, com distância de 0.4844064554284505:" = "9: Finding Nemo, with distance of 0.4844064554284505:"

When it comes to recommendation systems, measuring performance is never a straightforward task. That is because there are many desirable characteristics that we are looking for in a recommendation: accuracy, diversity, novelty, ... All of which can be measured in some way or another. There are many very helpful articles on the web that cover the topic. I will link a few references that deal with precision in specific:

Bear in mind that to do any sort of evaluation you need to split your data into a train and a test set. In the case of recommender systems, since all users and all items must be represented in both the train and test sets, you must use a stratified approach. That means that you should take set aside a percentage of the movie reviews for each user instead of simply sampling lines of your dataset.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM