基于规则的推荐系统

Question

I have a data frame that contains the names of 3 friends and 10 restaurants that are ranked 1-10 (where Rank 1 indicates most likely to be interested, while rank 10 means least likely to be interested) as InterestRank for each friend.我有一个数据框，其中包含排名 1-10 的 3 个朋友和 10 个餐厅的名称（其中排名 1 表示最有可能感兴趣，而排名 10 表示最不可能感兴趣）作为每个朋友的 InterestRank。 The Data frame contains attributes of restaurants too like Cost, Cuisine and Alcohol served or not.数据框包含餐厅的属性，如成本、菜肴和是否提供酒精。 The Data frame looks like following:数据框如下所示：

FriendName,Restaurant,InterestRank,Cuisine,Cost,Alcohol
Amy,R2,1,French,$$,No
Ben,R2,3,French,$$,No
Cathy,R2,8,French,$$,No
Amy,R1,2,French,$$$,Yes
Ben,R1,9,French,$$$,Yes
Cathy,R1,5,French,$$$,Yes
Amy,R4,3,French,$$$,Yes
Ben,R4,5,French,$$$,Yes
Cathy,R4,10,French,$$$,Yes
Amy,R3,4,French,$$,Yes
Ben,R3,10,French,$$,Yes
Cathy,R3,6,French,$$,Yes
Amy,R10,5,Mexican,$$$,Yes
Ben,R10,6,Mexican,$$$,Yes
Cathy,R10,7,Mexican,$$$,Yes
Amy,R7,6,Japanese,$$,Yes
Ben,R7,1,Japanese,$$,Yes
Cathy,R7,9,Japanese,$$,Yes
Amy,R6,7,Japanese,$,No
Ben,R6,8,Japanese,$,No
Cathy,R6,3,Japanese,$,No
Amy,R8,8,Mexican,$$,No
Ben,R8,4,Mexican,$$,No
Cathy,R8,2,Mexican,$$,No
Amy,R5,9,Japanese,$$,No
Ben,R5,2,Japanese,$$,No
Cathy,R5,1,Japanese,$$,No
Amy,R9,10,Mexican,$$,No
Ben,R9,7,Mexican,$$,No
Cathy,R9,4,Mexican,$$,No

I want to recommend the top 4 restaurants to each friend according to their InterestRank as well as a condition that no more than 2 restaurants with the same cuisine type will be recommended to each of them.我想根据每个朋友的 InterestRank 向他们推荐前 4 家餐厅，条件是每人最多推荐 2 家相同菜系的餐厅。 How to achieve this in a Pythonic way?如何以 Pythonic 的方式实现这一点？

Edit: Expected output data frame编辑：预期 output 数据框

I want the final output to be something like this:我希望最终的 output 是这样的：

FriendName好友名称	Restaurant餐厅	RecommendationRank推荐等级
Amy艾米	R2 R2	1 1个
Amy艾米	R1 R1	2 2个
Amy艾米	R10 R10	3 3个
Amy艾米	R7 R7	4 4个
Ben本	R7 R7	1 1个
Ben本	R2 R2	2 2个
Ben本	R5 R5	3 3个
Ben本	R8 R8	4 4个
Cathy凯茜	R5 R5	1 1个
Cathy凯茜	R8 R8	2 2个
Cathy凯茜	R6 R6	3 3个
Cathy凯茜	R9 R9	4 4个

Answer 1

Solution解决方案

We can use sort_values and groupby to achieve these type of window functions in a pandas.DataFrame .我们可以使用sort_values和groupby在 pandas.DataFrame 中实现这些类型的pandas.DataFrame功能。

from io import StringIO
import pandas as pd

input_data = """
FriendName,Restaurant,InterestRank,Cuisine,Cost,Alcohol
Amy,R2,1,French,$$,No
Ben,R2,3,French,$$,No
Cathy,R2,8,French,$$,No
Amy,R1,2,French,$$$,Yes
Ben,R1,9,French,$$$,Yes
Cathy,R1,5,French,$$$,Yes
Amy,R4,3,French,$$$,Yes
Ben,R4,5,French,$$$,Yes
Cathy,R4,10,French,$$$,Yes
Amy,R3,4,French,$$,Yes
Ben,R3,10,French,$$,Yes
Cathy,R3,6,French,$$,Yes
Amy,R10,5,Mexican,$$$,Yes
Ben,R10,6,Mexican,$$$,Yes
Cathy,R10,7,Mexican,$$$,Yes
Amy,R7,6,Japanese,$$,Yes
Ben,R7,1,Japanese,$$,Yes
Cathy,R7,9,Japanese,$$,Yes
Amy,R6,7,Japanese,$,No
Ben,R6,8,Japanese,$,No
Cathy,R6,3,Japanese,$,No
Amy,R8,8,Mexican,$$,No
Ben,R8,4,Mexican,$$,No
Cathy,R8,2,Mexican,$$,No
Amy,R5,9,Japanese,$$,No
Ben,R5,2,Japanese,$$,No
Cathy,R5,1,Japanese,$$,No
Amy,R9,10,Mexican,$$,No
Ben,R9,7,Mexican,$$,No
Cathy,R9,4,Mexican,$$,No
""".strip()

# Read data from CSV-formatted string input
df = pd.read_csv(StringIO(input_data))

# Use sorting and grouping, along with `head`,
# to achieve the desired window functions
result = (
    df
    # Sort `(friend, cuisine)` group by interest rank and take the top 2
    .sort_values(by=['FriendName', 'Cuisine', 'InterestRank'], ascending=True)
    .groupby(['FriendName', 'Cuisine'])
    .head(2)
    # Sort `friend` group by interest rank and take the top 4
    .sort_values(by=['FriendName', 'InterestRank'], ascending=True)
    .groupby(['FriendName'])
    .head(4)
    # Reset index, which was just "scrambled" from the sorting and slicing
    .reset_index(drop=True)
)

print(result)

The result:结果：

   FriendName Restaurant  InterestRank   Cuisine Cost Alcohol
0         Amy         R2             1    French   $$      No
1         Amy         R1             2    French  $$$     Yes
2         Amy        R10             5   Mexican  $$$     Yes
3         Amy         R7             6  Japanese   $$     Yes
4         Ben         R7             1  Japanese   $$     Yes
5         Ben         R5             2  Japanese   $$      No
6         Ben         R2             3    French   $$      No
7         Ben         R8             4   Mexican   $$      No
8       Cathy         R5             1  Japanese   $$      No
9       Cathy         R8             2   Mexican   $$      No
10      Cathy         R6             3  Japanese    $      No
11      Cathy         R9             4   Mexican   $$      No

Edit: solution to additional request in comments编辑：评论中附加请求的解决方案

What if we want to add 2 conditions instead?如果我们想添加 2 个条件怎么办？ So like no more than 2 restaurants with the same cuisine type and also no more than 2 "No"s in Alcohol will be recommended to each of them.因此，就像不超过 2 家具有相同菜肴类型的餐厅以及不超过 2 家酒类餐厅一样，将向他们每个人推荐不超过 2 家餐厅。

# Read data from CSV-formatted string input
df = pd.read_csv(StringIO(input_data))

# Take top 2 "no alcohol" restaurants per friend
no_df = (
    df[df.Alcohol == 'No']
    .sort_values(by=['FriendName', 'InterestRank'], ascending=True)
    .groupby(['FriendName'])
    .head(2)
)
# Take top 4 alcoholic restaurants per friend
# (we don't mind if ultimately all 4 are alcohol restaurants 
# in the final result, as there is no restriction on these)
yes_df = (
    df[df.Alcohol == 'Yes']
    .sort_values(by=['FriendName', 'InterestRank'], ascending=True)
    .groupby(['FriendName'])
    .head(4)
)
# Concatenate and then proceed as before
result = (
    pd.concat([no_df, yes_df], axis=0)
    # Sort `(friend, cuisine)` group by interest rank and take the top 2
    .sort_values(by=['FriendName', 'Cuisine', 'InterestRank'], ascending=True)
    .groupby(['FriendName', 'Cuisine'])
    .head(2)
    # Sort `friend` group by interest rank and take the top 4
    .sort_values(by=['FriendName', 'InterestRank'], ascending=True)
    .groupby(['FriendName'])
    .head(4)
    # Reset index, which was just "scrambled" from the sorting and slicing
    .reset_index(drop=True)
)

print(result)

The result:结果：

   FriendName Restaurant  InterestRank   Cuisine Cost Alcohol
0         Amy         R2             1    French   $$      No
1         Amy         R1             2    French  $$$     Yes
2         Amy        R10             5   Mexican  $$$     Yes
3         Amy         R6             7  Japanese    $      No
4         Ben         R7             1  Japanese   $$     Yes
5         Ben         R5             2  Japanese   $$      No
6         Ben         R2             3    French   $$      No
7         Ben         R4             5    French  $$$     Yes
8       Cathy         R5             1  Japanese   $$      No
9       Cathy         R8             2   Mexican   $$      No
10      Cathy         R1             5    French  $$$     Yes
11      Cathy         R3             6    French   $$     Yes

基于规则的推荐系统

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-11-24 21:56:25

Solution解决方案

Edit: solution to additional request in comments编辑：评论中附加请求的解决方案

基于规则的推荐系统

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-11-24 21:56:25

Solution解决方案

Edit: solution to additional request in comments编辑：评论中附加请求的解决方案

解决方案1
1 已采纳 2022-11-24 21:56:25