简体   繁体   中英

Can I create column where each row is a running list in a Pandas data frame using groupby?

Imagine I have a Pandas DataFrame:

# create df
df = pd.DataFrame({'id': [1,1,1,2,2,2],
                   'val': [5,4,6,3,2,3]})

Lets assume it is ordered by 'id' and an imaginary, not shown, date column (ascending). I want to create another column where each row is a list of 'val' at that date.

The ending DataFrame will look like this:

df = pd.DataFrame({'id': [1,1,1,2,2,2],
                   'val': [5,4,6,3,2,3],
                   'val_list': [[5],[5,4],[5,4,6],[3],[3,2],[3,2,3]]})

I don't want to use a loop because the actual df I am working with has about 4 million records. I am imagining I would use a lambda function in conjunction with groupby (something like this):

df['val_list'] = df.groupby('id')['val'].apply(lambda x: x.runlist())

This raises an AttributError because the runlist() method does not exist, but I am thinking the solution would be something like this.

Does anyone know what to do to solve this problem?

Let us try

df['new'] = df.val.map(lambda x : [x]).groupby(df.id).apply(lambda x : x.cumsum())
Out[138]: 
0          [5]
1       [5, 4]
2    [5, 4, 6]
3          [3]
4       [3, 2]
5    [3, 2, 3]
Name: val, dtype: object

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM