简体   繁体   English

如何根据条件从 2 pandas DataFrame 中取出 select top K item?

[英]How to select top K items from 2 pandas DataFrame based on conditions?

Assume, there are two DataFrame: visitor & group .假设,有两个 DataFrame: visitor & group visitor stores each visitor information and which item s/he selected (likelihood values).访客存储每个访客信息和他/她选择的项目(可能性值)。 However, not every item has been purchased by all visitors.但是,并非所有访客都购买了每件商品。 group stores the certain items belong to which item-family information.存储某些项目属于哪个项目系列信息。 The following are the toy DataFrames以下是玩具 DataFrames

import numpy as np 
import pandas as pd
items = [11,12,13,14, 
         21,22,23,24,
         2,7,9,10]
col_names = [2,7,9,10,11,13,14,21,24]
np.random.seed(123)
nums = np.round(np.random.random(size = (3,9)),2)

visitor = pd.DataFrame(nums, index = (100,101,102))
visitor.columns = col_names

group = pd.DataFrame({'item':sorted(items),
                      'family':sorted(['a1','a2','a3']*4)})
print(visitor)

       2     7     9     10    11    13    14    21    24
100  0.70  0.29  0.23  0.55  0.72  0.42  0.98  0.68  0.48
101  0.39  0.34  0.73  0.44  0.06  0.40  0.74  0.18  0.18
102  0.53  0.53  0.63  0.85  0.72  0.61  0.72  0.32  0.36
print(group)

    item family
0      2     a1
1      7     a1
2      9     a1
3     10     a1
4     11     a2
5     12     a2
6     13     a2
7     14     a2
8     21     a3
9     22     a3
10    23     a3
11    24     a3

The goal is to select top 2 items that are from DIFFERENT item-family based on the values.目标是select基于值的来自不同项目系列的前 2 个项目。 This is my code这是我的代码

def Basket(df, x, num_items = 2):
    keys = list(df)   
    values = df.loc[x]   
    item_dict = dict([(i, j) for i, j in zip(keys, values)])
    output = list(dict(sorted(item_dict.items(), key=lambda kv: kv[1], reverse = True)))[:num_items]
    return output

print(Basket(df = visitor, dx = 100))
[14, 11]  # 14 & 11 from the same family: a2

print(Basket(df = visitor, x = 101))
[14, 9] # 14 & 9 from different families: a2 & a1

I am not sure how to incorporate the group df into my code to select top 2 items (based on the values and item-family information) from different family such as我不确定如何将df 合并到我的代码中,以 select 来自不同系列的前 2 个项目(基于值和项目系列信息),例如

print(Basket(df1 = visitor, df2 = group, x = 100))
[14, 2]

print(Basket(df1 = visitor, df2 = group, x = 101))
[14, 9]

Note: 100, 101, and 102 represent visitor id (row index).注: 100、101、102代表访客id(行索引)。 any suggestion?有什么建议吗? many thanks in advance提前谢谢了

Try:尝试:

def basket(visitor, x, number_items=2):
    return (visitor.loc[[x]].T                     # selecting visitor id and transposing 
                  .merge(group, 
                         left_index=True, 
                         right_on='item')          # merging with group dataframe 
                  .sort_values(x, ascending=False) # sorting on values in group
                  .groupby('family')               # creating family groups
                  .head(1)                         # selecting one item from each group
                  .head(number_items)['item']      # Getting top n items
                  .to_numpy())                     # return numpy array

Output: Output:

basket(visitor, 100, 2)
# array([14,  2], dtype=int64)

basket(visitor, 101, 2)
# array([14,  9], dtype=int64)

You can merge your 2 dataframes before:您可以在之前合并您的 2 个数据框:

out = visitor.rename_axis('visitor').melt(var_name='item', ignore_index=False) \
             .reset_index().merge(group, on='item')

out = out.loc[out.groupby(['visitor', 'family'])['value'].nlargest(2).index.levels[-1]] \
         .sort_values(['visitor', 'family', 'value'], ascending=[True, True, False], ignore_index=True)

Output: Output:

>>> out
    visitor  item  value family
0       100     2   0.70     a1
1       100    10   0.55     a1
2       100    14   0.98     a2
3       100    11   0.72     a2
4       100    21   0.68     a3
5       100    24   0.48     a3
6       101     9   0.73     a1
7       101    10   0.44     a1
8       101    14   0.74     a2
9       101    13   0.40     a2
10      101    21   0.18     a3
11      101    24   0.18     a3
12      102    10   0.85     a1
13      102     9   0.63     a1
14      102    11   0.72     a2
15      102    14   0.72     a2
16      102    24   0.36     a3
17      102    21   0.32     a3

Intermediate result after merge : merge后的中间结果:

>>> out
    visitor  item  value family
0       100     2   0.70     a1
1       101     2   0.39     a1
2       102     2   0.53     a1
3       100     7   0.29     a1
4       101     7   0.34     a1
5       102     7   0.53     a1
6       100     9   0.23     a1
7       101     9   0.73     a1
8       102     9   0.63     a1
9       100    10   0.55     a1
10      101    10   0.44     a1
11      102    10   0.85     a1
12      100    11   0.72     a2
13      101    11   0.06     a2
14      102    11   0.72     a2
15      100    13   0.42     a2
16      101    13   0.40     a2
17      102    13   0.61     a2
18      100    14   0.98     a2
19      101    14   0.74     a2
20      102    14   0.72     a2
21      100    21   0.68     a3
22      101    21   0.18     a3
23      102    21   0.32     a3
24      100    24   0.48     a3
25      101    24   0.18     a3
26      102    24   0.36     a3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 有没有更好的方法来基于多个条件从 pandas DataFrame 行 select 行? - Is there a better way to select rows from a pandas DataFrame based on multiple conditions? 如何根据条件从一个 dataframe 到另一个 dataframe 中的 select 行 - How to select rows from a dataframe based on conditions with another dataframe 如何在一定条件下从熊猫数据框中选择行 - How to select rows from the pandas dataframe with certain conditions 如何从满足条件 A 或 B 的 pandas DataFrame 中获取 select 数据? - How to select data from a pandas DataFrame that meet conditions A or B? 如何根据python中的另一个数据框选择前k行? - How can I select top k rows based on another dataframe in python? 我如何根据多个条件从 DataFrame 中获取 select 行 - How do I select rows from a DataFrame based on multi conditions 如何从 pandas 中的 dataframe 列表中 select 前 n 列? - How to select top n columns from list of dataframe in pandas? 如何根据条件从一个 pandas dataframe 中提取数据并添加到另一个 pandas dataframe 中? - How pick data from one pandas dataframe and add it to the other pandas dataframe based on conditions? 熊猫:如何选择具有多个条件的数据框的边框 - Pandas: how to select a susbset of a dataframe with multiple conditions 如何基于熊猫数据框中的多个条件进行子集 - How to subset based on multiple conditions in pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM