简体   繁体   中英

Python Pandas: multiple aggregations -> list of values

I have a DataFrame which contains the results of multiple aggregation functions applied to multiple columns, for example:

bar = pd.DataFrame([
    {'a': 1, 'b': 2, 'grp': 0}, {'a': 3, 'b': 8, 'grp': 0}, 
    {'a': 2, 'b': 2, 'grp': 1}, {'a': 4, 'b': 5, 'grp': 1}
])
bar.groupby('grp').agg([np.mean, np.std])

        a               b
    mean   std      mean  std
grp             
0   2   1.414214    5.0 4.242641
1   3   1.414214    3.5 2.121320

I want to combine the aggregation results to lists (or tuples):

grp        a                 b  
0   [2, 1.414214]     [5.0, 4.242641]
1   [3, 1.414214]     [3.5, 2.121320]

What would be the proper way to do this?

Thanks in advance!

If you've to use lists in columns. You can

In [60]:  bar.groupby('grp').agg(lambda x: [x.mean(), x.std()])
Out[60]:
                             a                          b
grp
0    [2.0, 1.4142135623730951]   [5.0, 4.242640687119285]
1    [3.0, 1.4142135623730951]  [3.5, 2.1213203435596424]

Not recommended to store data like this for pandas.

What would be the proper way to do this?

There is no proper way. Pandas was never designed to hold lists in series / columns. You can concoct expensive workarounds, but these are not recommended.

The main reason holding lists in series is not recommended is you lose all vectorised functionally attached to having numeric series with NumPy arrays held in contiguous memory blocks. Your series will be of object dtype, which represents a sequence of pointers. You will lose benefits in terms of memory and performance.

See also What are the advantages of NumPy over regular Python lists? The arguments in favour of Pandas are the same as for NumPy.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM