从交易数据框（熊猫）标记客户

Question

I'm working on a transactional dataset for customers that have purchased 'X'. 我正在为购买“ X”的客户处理交易数据集。 Here's the logic: 这是逻辑：

If the customer has a purchase history prior purchasing item 'X' (ex. custID 1) then label that customer as 'great' 如果客户在购买商品“ X”（例如custID 1）之前有购买记录，则将该客户标记为“伟大”

If the customer purchased X only once (ex. custID 2, 3) then label that customer as 'boo' 如果客户只购买了X次（例如custID 2、3），则将该客户标记为“ boo”

If the customer purchased X as their first purchase and then purchased other items (ex. custID 4) then label that customer as 'awesome' 如果客户是第一次购买X，然后又购买了其他商品（例如custID 4），则将该客户标记为“很棒”

I would like to write this function using python. 我想使用python编写此函数。 Any suggestions would be highly appreciated. 任何建议将不胜感激。

Current Output: 电流输出：

list = [(1, 111, '2016-01-10', 'A'), (1, 112, '2016-02-02', 'B'), (1, 112, '2016-02-02', 'C'), (1, 113, '2016-04-10', 'X'), (2, 211, '2016-02-02', 'X'), 
        (3, 311, '2016-04-05', 'X'), (4, 411, '2016-02-05', 'X'), (4, 411, '2016-02-05', 'C'), (4, 412, '2016-03-10', 'E'), (4, 413, '2016-07-14', 'E')]
labels = ['custID', 'transacID', 'orderDate', 'itemDescription']
df = pd.DataFrame.from_records(list, columns=labels)
df
      custID transacID orderDate itemDescription
0       1        111  2016-01-10               A
1       1        112  2016-02-02               B
2       1        112  2016-02-02               C
3       1        113  2016-04-10               X
4       2        211  2016-02-02               X
5       3        311  2016-04-05               X
6       4        411  2016-02-05               X
7       4        411  2016-02-05               C
8       4        412  2016-03-10               E
9       4        413  2016-07-14               E

Expected Output: 预期产量：

      custID transacID orderDate itemDescription  label
0       1        111  2016-01-10               A  great
1       1        112  2016-02-02               B  great
2       1        112  2016-02-02               C  great
3       1        113  2016-04-10               X  great
4       2        211  2016-02-02               X  boo
5       3        311  2016-04-05               X  boo
6       4        411  2016-02-05               X  awesome
7       4        411  2016-02-05               C  awesome
8       4        412  2016-03-10               E  awesome
9       4        413  2016-07-14               E  awesome

Answer 1

Here is solution that using groupby and apply with custom function: 这是使用groupby并apply自定义功能的解决方案：

def categorize(g):
    if len(g) > 1 and g.iloc[0]['itemDescription'] == 'X':
        g['label'] = 'great'
    elif len(g) > 1 and g.iloc[0]['itemDescription'] != 'X':
        g['label'] = 'awesome'
    else:
        g['label'] = 'boo'
    return g

df.groupby('custID').apply(categorize)  
#    custID  transacID   orderDate itemDescription    label
# 0       1        111  2016-01-10               A  awesome
# 1       1        112  2016-02-02               B  awesome
# 2       1        112  2016-02-02               C  awesome
# 3       1        113  2016-04-10               X  awesome
# 4       2        211  2016-02-02               X      boo
# 5       3        311  2016-04-05               X      boo
# 6       4        411  2016-02-05               X    great
# 7       4        411  2016-02-05               C    great
# 8       4        412  2016-03-10               E    great
# 9       4        413  2016-07-14               E    great

It's probable that there is more pandorable solution to this. 可能有更可笑的解决方案。

从交易数据框（熊猫）标记客户

问题描述

1 个解决方案

解决方案1
4 已采纳 2018-04-25 20:05:06

从交易数据框（熊猫）标记客户

问题描述

1 个解决方案

解决方案1 4 已采纳 2018-04-25 20:05:06

解决方案1
4 已采纳 2018-04-25 20:05:06