如何根据某人之前看过的电影推荐电影？

Question

For a machine learning exercise I am working on, I am given a dataset where each row contains the following features:对于我正在进行的机器学习练习，我得到了一个数据集，其中每一行都包含以下特征：

the person's name,这个人的名字，
age,年龄，
gender, and性别，以及
the movie they watched.他们看的电影。

My task is to suggest other movies that the person might like based off these features.我的任务是根据这些特征推荐该人可能喜欢的其他电影。

The thing is, I am not given a feature set for movies.问题是，我没有获得电影的功能集。 I am only given the dataset described above.我只得到了上面描述的数据集。

I already know I need to generate a feature set for movies.我已经知道我需要为电影生成一个功能集。 However, I don't know how to approach this.但是，我不知道如何处理这个问题。

After I create the feature set, I will convert each movie's feature set into an embedding (vector).创建特征集后，我会将每部电影的特征集转换为嵌入（向量）。 Then I will use a similarity-matching library (such as Spotify's Annoy ) to find return embeddings of similar movies.然后我将使用相似性匹配库（例如 Spotify 的Annoy ）来查找相似电影的返回嵌入。

The part I am stuck at is how I can use the dataset to generate a feature set for each movie.我坚持的部分是如何使用数据集为每部电影生成一个特征集。

Answer 1

Imagine that you have a table like this:想象一下，你有一张这样的表：

+-------+-----+--------+---------------------+
| Name  | Age | Gender |        Movie        |
+-------+-----+--------+---------------------+
| John  |  23 | Male   | John the Ripper     |
| Luke  |  18 | Male   | The Star Wars       |
| Ann   |  18 | Female | Mr. Nobody          |
| Alice |  12 | Female | Alice in Wonderland |
| Bruce |  64 | Male   | Armageddon          |
+-------+-----+--------+---------------------+

I. First of all, you need to separate this table by two parts:一、首先你需要把这张表分成两部分：

The features vector which contains Name , Age , Gender columns.包含Name 、 Age 、 Gender列的特征向量。
The purpose vector which contains only Movie column.仅包含Movie列的目的向量。

II.二、 After that you could to encode your strings into numbers:之后，您可以将字符串编码为数字：

The column Name will be encoded into unique index.列名称将被编码为唯一索引。
The column Age will not changes.列年龄不会改变。
The column Gender will be encoded into binary values (0, 1).列Gender将被编码为二进制值 (0, 1)。
The column Movie will be encoded into unique index values.列Movie将被编码为唯一的索引值。

For example:例如：

+------+-----+--------+-------+
| Name | Age | Gender | Movie |
+------+-----+--------+-------+
|    0 |  23 |      1 |     3 |
|    1 |  18 |      1 |     2 |
|    2 |  18 |      0 |     4 |
|    3 |  12 |      0 |     1 |
|    4 |  64 |      1 |     0 |
+------+-----+--------+-------+

III.三、 Then you may separate your vector on two parts:然后你可以将你的向量分成两部分：

Train data for machine learning algorithm feeding ( rows 1:3 ).用于机器学习算法馈送的训练数据（行 1:3 ）。
Test data for contest ML algorithm that you feed ( rows 3:5 ).您提供的竞赛 ML 算法的测试数据（第3:5 行）。

The proportion between this separate set may be different, but usually train data set picks greater than test data set.这个单独集之间的比例可能不同，但通常训练数据集选择大于测试数据集。

IV.四、 Sometimes you could need for scaling your data.有时您可能需要扩展数据。

For example:例如：

+------+--------+--------+-------+
| Name |  Age   | Gender | Movie |
+------+--------+--------+-------+
| 0.0  | 0.3594 |      1 | 0.6   |
| 0.2  | 0.2813 |      1 | 0.4   |
| 0.4  | 0.2813 |      0 | 0.8   |
| 0.6  | 0.1875 |      0 | 0.2   |
| 0.8  | 1.0000 |      1 | 0.0   |
+------+--------+--------+-------+

In this sample after steps I-IV you will get:在此示例中，在步骤 I-IV 之后，您将获得：

feature_train = [[ 0.0, 0.3594, 1 ], [ 0.2, 0.2813, 1 ], [ 0.4, 0.2813, 0 ]]
purpose_train = [ 0.6, 0.4, 0.8 ]
feature_test  = [[ 0.6, 0.1875, 0], [0.8, 1.0000, 1]]
purpose_test  = [[ 0.2, 0.0]]

That's all to prepare data in simple way.这就是以简单的方式准备数据。

[UDP] [UDP]

After all this steps, you should teach your algorithm by the data, and then you may predict the favorite Movie by Name, Age and Gender for the choosed one.完成所有这些步骤后，您应该根据数据教授您的算法，然后您可以根据所选电影的姓名、年龄和性别预测最喜欢的电影。

如何根据某人之前看过的电影推荐电影？

问题描述

1 个解决方案

解决方案1
0 2020-01-16 02:39:49

如何根据某人之前看过的电影推荐电影？

问题描述

1 个解决方案

解决方案1 0 2020-01-16 02:39:49

解决方案1
0 2020-01-16 02:39:49