简体   繁体   中英

Complex Concatenation on a pandas dataframe

I have a dataframe like this:

user_id | visit_order  | campaign 

1  | 1    |     C1
1  | 2    |     C2
1  | 3    |     C3
2  | 1    |     C2
2  | 2    |     C1

I want to create a column that takes the visit order at user_id level and gives an output like this:

user_id | visit_order  | campaign | campaign_order

1  | 1    |     C1  |  [C1]
1  | 2    |     C2  |  [C1, C2]
1  | 3    |     C3  |  [C1, C2, C3]
2  | 1    |     C2  |  [C2]
2  | 2    |     C1  |  [C2, C1]

Is it possible to do this? Any help is much appreciated:)

You need to use a custom function to simulate exanding .

NB. the data must be sorted by "visit_order" first.

def expand(ser):
    l = []
    out = []
    for e in ser:
        l.append(e)
        out.append(l.copy())
    return pd.Series(out, index=ser.index)

df['campaign_order'] = df.groupby('user_id', group_keys=False)['campaign'].apply(expand)

output:

   user_id  visit_order campaign campaign_order
0        1            1       C1           [C1]
1        1            2       C2       [C1, C2]
2        1            3       C3   [C1, C2, C3]
3        2            1       C2           [C2]
4        2            2       C1       [C2, C1]

Actualy this worked perfectly:

df_test.groupby(['user_id','campaign','visit_order'])['campaign'].agg([('a', ';'.join), ('b', lambda x: x.tolist())])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM