I have a dataframe like this:
user_id | visit_order | campaign
1 | 1 | C1
1 | 2 | C2
1 | 3 | C3
2 | 1 | C2
2 | 2 | C1
I want to create a column that takes the visit order at user_id level and gives an output like this:
user_id | visit_order | campaign | campaign_order
1 | 1 | C1 | [C1]
1 | 2 | C2 | [C1, C2]
1 | 3 | C3 | [C1, C2, C3]
2 | 1 | C2 | [C2]
2 | 2 | C1 | [C2, C1]
Is it possible to do this? Any help is much appreciated:)
You need to use a custom function to simulate exanding
.
NB. the data must be sorted by "visit_order" first.
def expand(ser):
l = []
out = []
for e in ser:
l.append(e)
out.append(l.copy())
return pd.Series(out, index=ser.index)
df['campaign_order'] = df.groupby('user_id', group_keys=False)['campaign'].apply(expand)
output:
user_id visit_order campaign campaign_order
0 1 1 C1 [C1]
1 1 2 C2 [C1, C2]
2 1 3 C3 [C1, C2, C3]
3 2 1 C2 [C2]
4 2 2 C1 [C2, C1]
Actualy this worked perfectly:
df_test.groupby(['user_id','campaign','visit_order'])['campaign'].agg([('a', ';'.join), ('b', lambda x: x.tolist())])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.