[英]Conditional grouped CumCount pandas
我有這個DataFrame:
dic = {'users' : ['A','A','B','A','A','B','A','A','A','A','A','B','A'],
'product' : [1,1,2,2,1,2,1,2,1,1,2,1,1],
'action' : ['see', 'see', 'see', 'see', 'buy', 'buy', 'see', 'see', 'see', 'see', 'buy', 'buy', 'buy']
}
df = pd.DataFrame(dic, columns=dic.keys())
df
users product action
0 A 1 see
1 A 1 see
2 B 2 see
3 A 2 see
4 A 1 buy
5 B 2 buy
6 A 1 see
7 A 2 see
8 A 1 see
9 A 1 see
10 A 2 buy
11 B 1 buy
12 A 1 buy
我需要的是一列,用於統計每個用戶在購買產品之前看到了多少次。
結果應該是這樣的:
dic = {'users' : ['A','A','B','A','A','B','A','A','A','A','A','B','A'],
'product' : [1,1,2,2,1,2,1,2,1,1,2,1,1],
'action' : ['see', 'see', 'see', 'see', 'buy', 'buy', 'see', 'see', 'see', 'see', 'buy', 'buy', 'buy'],
'see_before_buy' : [1,2,1,1,2,1,1,2,2,3,2,0,3]
}
users product action see_before_buy
0 A 1 see 1
1 A 1 see 2
2 B 2 see 1
3 A 2 see 1
4 A 1 buy 2
5 B 2 buy 1
6 A 1 see 1
7 A 2 see 2
8 A 1 see 2
9 A 1 see 3
10 A 2 buy 2
11 B 1 buy 0
12 A 1 buy 3
有人可以幫我嗎?
您可能需要在cumsum
之后使用cumsum
為groupby
創建一個cumsum
shfit
addkey=df.groupby(['user','#product']).action.apply(lambda x : x.eq('buy').shift().fillna(0).cumsum())
df['seebefore']=df.action.eq('see').groupby([df.user,df['#product'],addkey]).cumsum()
df
Out[131]:
index user #product action seebefore
0 0 A 1 see 1.0
1 1 A 1 see 2.0
2 2 B 2 see 1.0
3 3 A 2 see 1.0
4 4 A 1 buy 2.0
5 5 B 2 buy 1.0
6 6 A 1 see 1.0
7 7 A 2 see 2.0
8 8 A 1 see 2.0
9 9 A 1 see 3.0
10 10 A 2 buy 2.0
11 11 B 1 buy 0.0
12 12 A 1 buy 3.0
一種方法是:
首先獲得所有用戶和產品
users=list(df.users.unique())
products=list(df.products.unique())
為用戶產品組合創建字典,以跟蹤每個用戶看過哪些產品
see_dict={users[i]:{products[j]:0 for j in range(len(products))} for i in range(len(users))}
#{'A': {1: 0, 2: 0}, 'B': {1: 0, 2: 0}}
初始化空列
df["see_before_buy"]=None
現在,對於每一行,如果是see操作,請更新字典(增量)並分配值。 如果是購買操作,則僅分配值並重置計數器
for i in range(len(df)):
user=df.loc[i,"users"]
product=df.loc[i,"products"]
if(df.loc[i,"action"]=="see"): #if the action is see
see_dict[user][product]+=1 #increment the see dictionary
df.loc[i,"see_before_buy"]=see_dict[user][product] #assign this value for this row
else: #buy action
df.loc[i,"see_before_buy"]=see_dict[user][product] #assign the current value
see_dict[user][product]=0 #reset the counter
輸出量
users products action see_before_buy
0 A 1 see 1
1 A 1 see 2
2 B 2 see 1
3 A 2 see 1
4 A 1 buy 2
5 B 2 buy 1
6 A 1 see 1
7 A 2 see 2
8 A 1 see 2
9 A 1 see 3
10 A 2 buy 2
11 B 1 buy 0
12 A 1 buy 3
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.