[英]Python pandas groupby values based on multiple column values
I have a sequential campaign data in Pandas dataset.我在 Pandas 数据集中有一个连续的活动数据。
#sample data code
user_id = [9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,4705,4705,4705,4705,4705,223,223,223,223,223,223,223,223]
transaction_Value= [50,125,0,100,0,1000,473,0,47,110,0,44,93,0,49,92,0,242,0,75,0,47,122,0,50,100,200,0,35,85,0,50]
Campaign = ['M1','M1','Used','M1','Used','W1','Used','Used','W2','W2','Used','W2','W2','Used','W2','W2','Used','O1','Used','W3','Used','W2','S1','Lost','M1','M1','M1','Used','W2','S2','Lost','S2',]
transaction_c= [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,1,2,3,4,5,1,2,3,4,5,6,7,8]
df = pd.DataFrame(list(zip(user_id,transaction_Value,Campaign,transaction_c)), columns =['user_id','transaction_Value', 'Campaign','transaction_c'])
So far I have used the following code to group the data到目前为止,我已经使用以下代码对数据进行分组
df2 = (df.set_index(['user_id',df.groupby('user_id').cumcount()])[('transaction_Value')]
.unstack(fill_value='')
.reset_index())
This Transposes the value based on the transaction number这会根据交易编号转置值
| user_id | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 |
|---------|----|-----|-----|-----|----|------|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|
| 9 | 50 | 125 | 0 | 100 | 0 | 1000 | 473 | 0 | 47 | 110 | 0 | 44 | 93 | 0 | 49 | 92 | 0 | 242 | 0 |
| 223 | 50 | 100 | 200 | 0 | 35 | 85 | 0 | 50 | | | | | | | | | | | |
| 4705 | 75 | 0 | 47 | 122 | 0 | | | | | | | | | | | | | | |
how do I write a code so that this is changed to every time the rows value is used or lost如何编写代码以便每次使用或丢失行值时都会更改为
I could do the same for the Campaign values and then stack these 2 dataframes together我可以对 Campaign 值执行相同的操作,然后将这两个数据帧堆叠在一起
Ideal output理想output
| user_id | Type | 1 | 2 | 3 | 4 |
|---------|-------------|------|------|------|------|
| 9 | Campaign | M1 | M1 | Used | |
| 9 | Campaign | M1 | Used | | |
| 9 | Campaign | W1 | Used | | |
| 9 | Campaign | Used | | | |
| 9 | Campaign | W2 | W2 | Used | |
| 9 | Campaign | W2 | W2 | Used | |
| 9 | Campaign | W2 | W2 | Used | |
| 9 | Campaign | O1 | Used | | |
| 223 | Campaign | M1 | M1 | M1 | Used |
| 223 | Campaign | W2 | S2 | Lost | |
| 223 | Campaign | S2 | | | |
| 9 | Transaction | 50 | 125 | 0 | |
| 9 | Transaction | 100 | 0 | | |
| 9 | Transaction | 1000 | 473 | | |
| 9 | Transaction | 0 | | | |
| 9 | Transaction | 47 | 110 | 0 | |
| 9 | Transaction | 44 | 93 | 0 | |
| 9 | Transaction | 49 | 92 | 0 | |
| 223 | Transaction | 242 | 0 | | |
| 223 | Transaction | 50 | 100 | 200 | 0 |
| 223 | Transaction | 35 | 85 | 0 | |
| 223 | Transaction | 50 | | | |
Appreciate all the help in doing resolving this.感谢您为解决此问题提供的所有帮助。 thanks:)
谢谢:)
Create groups by test Campaign
by Series.isin
with change order by iloc
and created groups by Series.cumsum
, added to set_index
and groupby
and then use DataFrame.stack
with sorting by third level, last remove second level and convert MultiIndex
to columns:通过
Series.isin
使用DataFrame.stack
更改顺序创建测试Campaign
并通过iloc
创建组,添加到Series.cumsum
和groupby
然后使用set_index
按第三级排序,最后删除第二级并将MultiIndex
转换为列:
g = df['Campaign'].isin(['Used','Lost']).iloc[::-1].cumsum().iloc[::-1]
g = pd.factorize(g)[0]
df2 = (df.set_index(['user_id',g, df.groupby(['user_id', g]).cumcount()])[['Campaign','transaction_Value']]
.unstack(fill_value='')
.stack(0)
.sort_index(level=[2])
.rename_axis(['user_id','Campaign','Type'])
.reset_index(level=1, drop=True)
.reset_index())
print (df2)
user_id Type 0 1 2 3
0 9 Campaign M1 M1 Used
1 9 Campaign M1 Used
2 9 Campaign W1 Used
3 9 Campaign Used
4 9 Campaign W2 W2 Used
5 9 Campaign W2 W2 Used
6 9 Campaign W2 W2 Used
7 9 Campaign O1 Used
8 223 Campaign M1 M1 M1 Used
9 223 Campaign W2 S2 Lost
10 223 Campaign S2
11 4705 Campaign W3 Used
12 4705 Campaign W2 S1 Lost
13 9 transaction_Value 50 125 0
14 9 transaction_Value 100 0
15 9 transaction_Value 1000 473
16 9 transaction_Value 0
17 9 transaction_Value 47 110 0
18 9 transaction_Value 44 93 0
19 9 transaction_Value 49 92 0
20 9 transaction_Value 242 0
21 223 transaction_Value 50 100 200 0
22 223 transaction_Value 35 85 0
23 223 transaction_Value 50
24 4705 transaction_Value 75 0
25 4705 transaction_Value 47 122 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.