[英]Rule-based recommendation system
I have a data frame that contains the names of 3 friends and 10 restaurants that are ranked 1-10 (where Rank 1 indicates most likely to be interested, while rank 10 means least likely to be interested) as InterestRank for each friend.我有一个数据框,其中包含排名 1-10 的 3 个朋友和 10 个餐厅的名称(其中排名 1 表示最有可能感兴趣,而排名 10 表示最不可能感兴趣)作为每个朋友的 InterestRank。 The Data frame contains attributes of restaurants too like Cost, Cuisine and Alcohol served or not.
数据框包含餐厅的属性,如成本、菜肴和是否提供酒精。 The Data frame looks like following:
数据框如下所示:
FriendName,Restaurant,InterestRank,Cuisine,Cost,Alcohol
Amy,R2,1,French,$$,No
Ben,R2,3,French,$$,No
Cathy,R2,8,French,$$,No
Amy,R1,2,French,$$$,Yes
Ben,R1,9,French,$$$,Yes
Cathy,R1,5,French,$$$,Yes
Amy,R4,3,French,$$$,Yes
Ben,R4,5,French,$$$,Yes
Cathy,R4,10,French,$$$,Yes
Amy,R3,4,French,$$,Yes
Ben,R3,10,French,$$,Yes
Cathy,R3,6,French,$$,Yes
Amy,R10,5,Mexican,$$$,Yes
Ben,R10,6,Mexican,$$$,Yes
Cathy,R10,7,Mexican,$$$,Yes
Amy,R7,6,Japanese,$$,Yes
Ben,R7,1,Japanese,$$,Yes
Cathy,R7,9,Japanese,$$,Yes
Amy,R6,7,Japanese,$,No
Ben,R6,8,Japanese,$,No
Cathy,R6,3,Japanese,$,No
Amy,R8,8,Mexican,$$,No
Ben,R8,4,Mexican,$$,No
Cathy,R8,2,Mexican,$$,No
Amy,R5,9,Japanese,$$,No
Ben,R5,2,Japanese,$$,No
Cathy,R5,1,Japanese,$$,No
Amy,R9,10,Mexican,$$,No
Ben,R9,7,Mexican,$$,No
Cathy,R9,4,Mexican,$$,No
I want to recommend the top 4 restaurants to each friend according to their InterestRank as well as a condition that no more than 2 restaurants with the same cuisine type will be recommended to each of them.我想根据每个朋友的 InterestRank 向他们推荐前 4 家餐厅,条件是每人最多推荐 2 家相同菜系的餐厅。 How to achieve this in a Pythonic way?
如何以 Pythonic 的方式实现这一点?
Edit: Expected output data frame编辑:预期 output 数据框
I want the final output to be something like this:我希望最终的 output 是这样的:
FriendName![]() |
Restaurant![]() |
RecommendationRank![]() |
---|---|---|
Amy![]() |
R2 ![]() |
1 ![]() |
Amy![]() |
R1 ![]() |
2 ![]() |
Amy![]() |
R10 ![]() |
3 ![]() |
Amy![]() |
R7 ![]() |
4 ![]() |
Ben![]() |
R7 ![]() |
1 ![]() |
Ben![]() |
R2 ![]() |
2 ![]() |
Ben![]() |
R5 ![]() |
3 ![]() |
Ben![]() |
R8 ![]() |
4 ![]() |
Cathy![]() |
R5 ![]() |
1 ![]() |
Cathy![]() |
R8 ![]() |
2 ![]() |
Cathy![]() |
R6 ![]() |
3 ![]() |
Cathy![]() |
R9 ![]() |
4 ![]() |
We can use sort_values
and groupby
to achieve these type of window functions in a pandas.DataFrame
.我们可以使用
sort_values
和groupby
在 pandas.DataFrame 中实现这些类型的pandas.DataFrame
功能。
from io import StringIO
import pandas as pd
input_data = """
FriendName,Restaurant,InterestRank,Cuisine,Cost,Alcohol
Amy,R2,1,French,$$,No
Ben,R2,3,French,$$,No
Cathy,R2,8,French,$$,No
Amy,R1,2,French,$$$,Yes
Ben,R1,9,French,$$$,Yes
Cathy,R1,5,French,$$$,Yes
Amy,R4,3,French,$$$,Yes
Ben,R4,5,French,$$$,Yes
Cathy,R4,10,French,$$$,Yes
Amy,R3,4,French,$$,Yes
Ben,R3,10,French,$$,Yes
Cathy,R3,6,French,$$,Yes
Amy,R10,5,Mexican,$$$,Yes
Ben,R10,6,Mexican,$$$,Yes
Cathy,R10,7,Mexican,$$$,Yes
Amy,R7,6,Japanese,$$,Yes
Ben,R7,1,Japanese,$$,Yes
Cathy,R7,9,Japanese,$$,Yes
Amy,R6,7,Japanese,$,No
Ben,R6,8,Japanese,$,No
Cathy,R6,3,Japanese,$,No
Amy,R8,8,Mexican,$$,No
Ben,R8,4,Mexican,$$,No
Cathy,R8,2,Mexican,$$,No
Amy,R5,9,Japanese,$$,No
Ben,R5,2,Japanese,$$,No
Cathy,R5,1,Japanese,$$,No
Amy,R9,10,Mexican,$$,No
Ben,R9,7,Mexican,$$,No
Cathy,R9,4,Mexican,$$,No
""".strip()
# Read data from CSV-formatted string input
df = pd.read_csv(StringIO(input_data))
# Use sorting and grouping, along with `head`,
# to achieve the desired window functions
result = (
df
# Sort `(friend, cuisine)` group by interest rank and take the top 2
.sort_values(by=['FriendName', 'Cuisine', 'InterestRank'], ascending=True)
.groupby(['FriendName', 'Cuisine'])
.head(2)
# Sort `friend` group by interest rank and take the top 4
.sort_values(by=['FriendName', 'InterestRank'], ascending=True)
.groupby(['FriendName'])
.head(4)
# Reset index, which was just "scrambled" from the sorting and slicing
.reset_index(drop=True)
)
print(result)
The result:结果:
FriendName Restaurant InterestRank Cuisine Cost Alcohol
0 Amy R2 1 French $$ No
1 Amy R1 2 French $$$ Yes
2 Amy R10 5 Mexican $$$ Yes
3 Amy R7 6 Japanese $$ Yes
4 Ben R7 1 Japanese $$ Yes
5 Ben R5 2 Japanese $$ No
6 Ben R2 3 French $$ No
7 Ben R8 4 Mexican $$ No
8 Cathy R5 1 Japanese $$ No
9 Cathy R8 2 Mexican $$ No
10 Cathy R6 3 Japanese $ No
11 Cathy R9 4 Mexican $$ No
What if we want to add 2 conditions instead?
如果我们想添加 2 个条件怎么办? So like no more than 2 restaurants with the same cuisine type and also no more than 2 "No"s in Alcohol will be recommended to each of them.
因此,就像不超过 2 家具有相同菜肴类型的餐厅以及不超过 2 家酒类餐厅一样,将向他们每个人推荐不超过 2 家餐厅。
# Read data from CSV-formatted string input
df = pd.read_csv(StringIO(input_data))
# Take top 2 "no alcohol" restaurants per friend
no_df = (
df[df.Alcohol == 'No']
.sort_values(by=['FriendName', 'InterestRank'], ascending=True)
.groupby(['FriendName'])
.head(2)
)
# Take top 4 alcoholic restaurants per friend
# (we don't mind if ultimately all 4 are alcohol restaurants
# in the final result, as there is no restriction on these)
yes_df = (
df[df.Alcohol == 'Yes']
.sort_values(by=['FriendName', 'InterestRank'], ascending=True)
.groupby(['FriendName'])
.head(4)
)
# Concatenate and then proceed as before
result = (
pd.concat([no_df, yes_df], axis=0)
# Sort `(friend, cuisine)` group by interest rank and take the top 2
.sort_values(by=['FriendName', 'Cuisine', 'InterestRank'], ascending=True)
.groupby(['FriendName', 'Cuisine'])
.head(2)
# Sort `friend` group by interest rank and take the top 4
.sort_values(by=['FriendName', 'InterestRank'], ascending=True)
.groupby(['FriendName'])
.head(4)
# Reset index, which was just "scrambled" from the sorting and slicing
.reset_index(drop=True)
)
print(result)
The result:结果:
FriendName Restaurant InterestRank Cuisine Cost Alcohol
0 Amy R2 1 French $$ No
1 Amy R1 2 French $$$ Yes
2 Amy R10 5 Mexican $$$ Yes
3 Amy R6 7 Japanese $ No
4 Ben R7 1 Japanese $$ Yes
5 Ben R5 2 Japanese $$ No
6 Ben R2 3 French $$ No
7 Ben R4 5 French $$$ Yes
8 Cathy R5 1 Japanese $$ No
9 Cathy R8 2 Mexican $$ No
10 Cathy R1 5 French $$$ Yes
11 Cathy R3 6 French $$ Yes
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.