[英]Labeling customers from transactional dataframe (pandas)
I'm working on a transactional dataset for customers that have purchased 'X'. 我正在为购买“ X”的客户处理交易数据集。 Here's the logic: 这是逻辑:
If the customer has a purchase history prior purchasing item 'X' (ex. custID 1) then label that customer as 'great' 如果客户在购买商品“ X”(例如custID 1)之前有购买记录,则将该客户标记为“伟大”
If the customer purchased X only once (ex. custID 2, 3) then label that customer as 'boo' 如果客户只购买了X次(例如custID 2、3),则将该客户标记为“ boo”
If the customer purchased X as their first purchase and then purchased other items (ex. custID 4) then label that customer as 'awesome' 如果客户是第一次购买X,然后又购买了其他商品(例如custID 4),则将该客户标记为“很棒”
I would like to write this function using python. 我想使用python编写此函数。 Any suggestions would be highly appreciated. 任何建议将不胜感激。
Current Output: 电流输出:
list = [(1, 111, '2016-01-10', 'A'), (1, 112, '2016-02-02', 'B'), (1, 112, '2016-02-02', 'C'), (1, 113, '2016-04-10', 'X'), (2, 211, '2016-02-02', 'X'),
(3, 311, '2016-04-05', 'X'), (4, 411, '2016-02-05', 'X'), (4, 411, '2016-02-05', 'C'), (4, 412, '2016-03-10', 'E'), (4, 413, '2016-07-14', 'E')]
labels = ['custID', 'transacID', 'orderDate', 'itemDescription']
df = pd.DataFrame.from_records(list, columns=labels)
df
custID transacID orderDate itemDescription
0 1 111 2016-01-10 A
1 1 112 2016-02-02 B
2 1 112 2016-02-02 C
3 1 113 2016-04-10 X
4 2 211 2016-02-02 X
5 3 311 2016-04-05 X
6 4 411 2016-02-05 X
7 4 411 2016-02-05 C
8 4 412 2016-03-10 E
9 4 413 2016-07-14 E
Expected Output: 预期产量:
custID transacID orderDate itemDescription label
0 1 111 2016-01-10 A great
1 1 112 2016-02-02 B great
2 1 112 2016-02-02 C great
3 1 113 2016-04-10 X great
4 2 211 2016-02-02 X boo
5 3 311 2016-04-05 X boo
6 4 411 2016-02-05 X awesome
7 4 411 2016-02-05 C awesome
8 4 412 2016-03-10 E awesome
9 4 413 2016-07-14 E awesome
Here is solution that using groupby
and apply
with custom function: 这是使用groupby
并apply
自定义功能的解决方案:
def categorize(g):
if len(g) > 1 and g.iloc[0]['itemDescription'] == 'X':
g['label'] = 'great'
elif len(g) > 1 and g.iloc[0]['itemDescription'] != 'X':
g['label'] = 'awesome'
else:
g['label'] = 'boo'
return g
df.groupby('custID').apply(categorize)
# custID transacID orderDate itemDescription label
# 0 1 111 2016-01-10 A awesome
# 1 1 112 2016-02-02 B awesome
# 2 1 112 2016-02-02 C awesome
# 3 1 113 2016-04-10 X awesome
# 4 2 211 2016-02-02 X boo
# 5 3 311 2016-04-05 X boo
# 6 4 411 2016-02-05 X great
# 7 4 411 2016-02-05 C great
# 8 4 412 2016-03-10 E great
# 9 4 413 2016-07-14 E great
It's probable that there is more pandorable solution to this. 可能有更可笑的解决方案。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.