简体   繁体   English

如何根据某人之前看过的电影推荐电影?

[英]How can I suggest movies based off someone's prior-watched movies?

For a machine learning exercise I am working on, I am given a dataset where each row contains the following features:对于我正在进行的机器学习练习,我得到了一个数据集,其中每一行都包含以下特征:

  • the person's name,这个人的名字,
  • age,年龄,
  • gender, and性别,以及
  • the movie they watched.他们看的电影。

My task is to suggest other movies that the person might like based off these features.我的任务是根据这些特征推荐该人可能喜欢的其他电影。

The thing is, I am not given a feature set for movies.问题是,我没有获得电影的功能集。 I am only given the dataset described above.我只得到了上面描述的数据集。

I already know I need to generate a feature set for movies.我已经知道我需要为电影生成一个功能集。 However, I don't know how to approach this.但是,我不知道如何处理这个问题。

After I create the feature set, I will convert each movie's feature set into an embedding (vector).创建特征集后,我会将每部电影的特征集转换为嵌入(向量)。 Then I will use a similarity-matching library (such as Spotify's Annoy ) to find return embeddings of similar movies.然后我将使用相似性匹配库(例如 Spotify 的Annoy )来查找相似电影的返回嵌入。

The part I am stuck at is how I can use the dataset to generate a feature set for each movie.我坚持的部分是如何使用数据集为每部电影生成一个特征集。

Imagine that you have a table like this:想象一下,你有一张这样的表:

+-------+-----+--------+---------------------+
| Name  | Age | Gender |        Movie        |
+-------+-----+--------+---------------------+
| John  |  23 | Male   | John the Ripper     |
| Luke  |  18 | Male   | The Star Wars       |
| Ann   |  18 | Female | Mr. Nobody          |
| Alice |  12 | Female | Alice in Wonderland |
| Bruce |  64 | Male   | Armageddon          |
+-------+-----+--------+---------------------+

I. First of all, you need to separate this table by two parts:一、首先你需要把这张表分成两部分:

  1. The features vector which contains Name , Age , Gender columns.包含NameAgeGender列的特征向量。
  2. The purpose vector which contains only Movie column.仅包含Movie列的目的向量。

II.二、 After that you could to encode your strings into numbers:之后,您可以将字符串编码为数字:

  1. The column Name will be encoded into unique index.名称将被编码为唯一索引。
  2. The column Age will not changes.年龄不会改变。
  3. The column Gender will be encoded into binary values (0, 1).Gender将被编码为二进制值 (0, 1)。
  4. The column Movie will be encoded into unique index values.Movie将被编码为唯一的索引值。

For example:例如:

+------+-----+--------+-------+
| Name | Age | Gender | Movie |
+------+-----+--------+-------+
|    0 |  23 |      1 |     3 |
|    1 |  18 |      1 |     2 |
|    2 |  18 |      0 |     4 |
|    3 |  12 |      0 |     1 |
|    4 |  64 |      1 |     0 |
+------+-----+--------+-------+

III.三、 Then you may separate your vector on two parts:然后你可以将你的向量分成两部分:

  1. Train data for machine learning algorithm feeding ( rows 1:3 ).用于机器学习算法馈送的训练数据(行 1:3 )。
  2. Test data for contest ML algorithm that you feed ( rows 3:5 ).您提供的竞赛 ML 算法的测试数据(第3:5 行)。

The proportion between this separate set may be different, but usually train data set picks greater than test data set.这个单独集之间的比例可能不同,但通常训练数据集选择大于测试数据集。

IV.四、 Sometimes you could need for scaling your data.有时您可能需要扩展数据。

For example:例如:

+------+--------+--------+-------+
| Name |  Age   | Gender | Movie |
+------+--------+--------+-------+
| 0.0  | 0.3594 |      1 | 0.6   |
| 0.2  | 0.2813 |      1 | 0.4   |
| 0.4  | 0.2813 |      0 | 0.8   |
| 0.6  | 0.1875 |      0 | 0.2   |
| 0.8  | 1.0000 |      1 | 0.0   |
+------+--------+--------+-------+

In this sample after steps I-IV you will get:在此示例中,在步骤 I-IV 之后,您将获得:

feature_train = [[ 0.0, 0.3594, 1 ], [ 0.2, 0.2813, 1 ], [ 0.4, 0.2813, 0 ]]
purpose_train = [ 0.6, 0.4, 0.8 ]
feature_test  = [[ 0.6, 0.1875, 0], [0.8, 1.0000, 1]]
purpose_test  = [[ 0.2, 0.0]]

That's all to prepare data in simple way.这就是以简单的方式准备数据。

[UDP] [UDP]

After all this steps, you should teach your algorithm by the data, and then you may predict the favorite Movie by Name, Age and Gender for the choosed one.完成所有这些步骤后,您应该根据数据教授您的算法,然后您可以根据所选电影的姓名、年龄和性别预测最喜欢的电影。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据元数据对电影进行多标签分类,其中元数据主要是单个单词? - How to multi-label classify movies to film festivals based on its metadata, where the metadata is predominantly individual words? 电影的聚类类型 - Clustering Genres of Movies 在推荐系统的 Surprise 包中,如何打印出给定用户的推荐电影? - In Surprise package for recommender systems, how to print out the recommended movies for a given user? 尝试根据独立因素预测电影的评级 - Trying to predict imdbRating of the movies on the basis of independent factors 如何仅根据列表标题向用户建议标签? - How do I suggest tags to the user based only on the title of a list? 根据文本内容建议操作方法列表 - suggest list of how-to articles based on text content 如何指定scikit-learn的朴素贝叶斯的先验概率 - How to specify the prior probability for scikit-learn's Naive Bayes 如何根据前一年的趋势和其他一些变量来预测来年的价值? - How can I predict values for the upcoming year based on previous year's trend and a few other variables? 请建议我如何使用 python 从每个框类型 forms 的手动填充字符中提取文本数据 - Please suggest how can I extract text data from hand-filled character per box type forms using python 有人可以解释图像裁剪的工作原理吗? - Can someone explain how image cropping works?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM