简体繁体 English

训练后评估推荐 a model

[英]Evaluate recommendations after training a model

原文 2020-10-24 07:39:28 3 1 python/ tensorflow/ keras/ deep-learning/ neural-network

First of all, I would like to create a recommender system.首先，我想创建一个推荐系统。 With the help of a neural.network, this is supposed to make a prediction of which articles user X is most likely to buy.在神经网络的帮助下，这应该可以预测用户 X 最有可能购买哪些文章。

I have already trained a model with the right datasets and the help of the neuMF model (you can also look at the different layers in the picture).我已经用正确的数据集和 neuMF model 的帮助训练了一个neuMF （您也可以查看图片中的不同层）。

[Source https://arxiv.org/abs/1708.05031] [来源https://arxiv.org/abs/1708.05031]

My dataset contains the following:我的数据集包含以下内容：

The column event contains whether the user has looked at an item (view), placed it in the shopping cart (addtocart) or bought it (transaction).列event包含用户是否查看了商品（view）、将商品放入购物车（addtocart）或购买了商品（transaction）。

I have already found example implementations of how they determine the recommendations.我已经找到了他们如何确定建议的示例实现。 The following was written about it:以下是关于它的文章：

Now that I've trained my model, I'm ready to recommend songs for a given playlist, However.现在我已经训练了我的 model，我准备为给定的播放列表推荐歌曲，但是。 one issue that I encountered (see below) is that I need the embedding of that new playlist (as stored in my model) in order to find the closest relevant playlists in that embedding space using kmeans, I am not sure how to get around this issue- as is.我遇到的一个问题（见下文）是我需要嵌入新的播放列表（存储在我的模型中）以便使用 kmeans 在该嵌入空间中找到最接近的相关播放列表，我不确定如何解决这个问题问题-原样。 it seems that I have to retrain my whole model each time I get an input playlist in order to get that playlist embedding, Therefore, I just test my model on a randomly chosen playlist (which happens to be rock and oldies. mostly!) from the training set.似乎我每次获得输入播放列表时都必须重新训练我的整个 model 以便嵌入该播放列表，因此，我只是在随机选择的播放列表（恰好是摇滚和老歌！大部分！）上测试我的 model训练集。

To recommend songs, I first cluster the learned embeddings for all of the training playlists, and then select "neighbor" playlists for my given test playlist as all of the other playlists in that same cluster.为了推荐歌曲，我首先将所有训练播放列表的学习嵌入聚类，然后将我给定的测试播放列表的 select 个“邻居”播放列表与同一聚类中的所有其他播放列表聚类。 I then take all of the tracks from these playlists and feed the test playlist embedding and these "neighboring" tracks into my model for prediction.然后，我从这些播放列表中取出所有曲目，并将测试播放列表嵌入和这些“相邻”曲目输入我的 model 进行预测。 This ranks the "neighboring" tracks by how likely they are (under my model) to occur next in the given test playlist.这根据它们在给定测试播放列表中接下来出现的可能性（在我的模型下）对“相邻”曲目进行排名。

[Source https://github.com/caravanuden/spotify_recsys] [来源https://github.com/caravanuden/spotify_recsys]

I've just trained my model and now I'd like to make a recommendation as to which items User X is most likely to buy.我刚刚训练了我的 model，现在我想就用户 X 最有可能购买哪些商品提出建议。 Do I have to carry out another implementation of an algorithm that determines, for example, the nearest neighbors ( knn ) or is it sufficient to train the model and then derive the data from it?我是否必须执行另一种算法的实现，例如确定最近的邻居 ( knn ) 或者是否足以训练 model 然后从中导出数据？

How do I proceed after I have trained the model with the data, how do I get the recommendations from it?在用数据训练 model 之后，我该如何继续，如何从中获得建议？ What is state of the art in this area in order to receive the recommendations from the trained model? state 在这方面的艺术是什么，以便收到受过培训的 model 的建议？

Thanks in advance.提前致谢。 Looking forward to suggestions, ideas and answers.期待建议、想法和答案。

1 个解决方案

It depends on your use case for the model. This is twofold, firstly because of the performance (speed) required for your specific use case, and secondly in regards to the main weakness (in my opinion) with the neuMF model: if a user interacts with some more items, the predictions will not change, since they were not part of the training.这取决于model 的用例。这是双重的，首先是因为您的特定用例所需的性能（速度），其次是关于neuMF model 的主要弱点（在我看来）：如果用户与更多项目交互，预测不会改变，因为它们不是训练的一部分。 Because of this, if it is used in an real-time-online setting, the recommendations will essentially be based on previous behavior, and will not take into account the current session, unless the model is retrained.正因为如此，如果在实时在线设置中使用，推荐将基本上基于以前的行为，并且不会考虑当前的 session，除非 model 被重新训练。

The neuMF model is particularly good at batch predictions for interval recommendations. neuMF model 特别擅长区间推荐的批量预测。 If you, for example, would like to recommend items to users in a weekly email, then you would for each user, predict the output probability for each item, and then select top n (eg. 10) probabilities and recommend those.例如，如果您想要在每周 email 内向用户推荐商品，那么您将为每个用户预测每个商品的 output 概率，然后是 select 前n （例如 10）个概率并推荐它们。 (You would have to retrain the model next week, in order to get other predictions based on the users' latest item interactions.) So if there are 10000 unique items, for each user, make 10000 individual predictions, and recommend n -items based on those. （你将不得不在下周重新训练 model，以便根据用户最近的项目交互获得其他预测。）因此，如果有 10000 个独特的项目，对于每个用户，进行 10000 个单独的预测，并基于n项目推荐在那些。 The main drawback is of course that these 10000 predictions takes a while to perform.主要缺点当然是这 10000 个预测需要一段时间才能执行。 Because of this, it might not be suitable for real-time online predictions.因此，它可能不适合实时在线预测。 On the other hand, if you are clever with parallelization of the predictions, this limitation could be surpassed as well, although, might be unnecessary.另一方面，如果你对预测的并行化很聪明，那么这个限制也可以被超越，尽管这可能是不必要的。 This because, as explained previously, the predictions will not change depending on current user interactions.这是因为，如前所述，预测不会根据当前用户交互而改变。

Using knn to cluster users in the embedding-space, and then take these users' items, and feed them into the model seems unnecessary, and in my option, defeats the purpose of the whole model-architecture.使用knn在嵌入空间中对用户进行聚类，然后获取这些用户的项目，并将它们输入 model 似乎是不必要的，而且在我看来，这违背了整个模型架构的目的。 This because the whole point of the neuMF model is to generalize a given user's interaction with items among all the other users' interaction, and base the recommendations on that, so that you can, given a user and an item, get the probability for that specific item.这是因为neuMF model 的全部意义在于在所有其他用户的交互中概括给定用户与项目的交互，并以此为基础进行推荐，这样您就可以在给定用户和项目的情况下获得该交互的概率具体项目。