简体   繁体   中英

Python - Group by and sum a list of tuples

Given the following list:

[
    ('A', '', Decimal('4.0000000000'), 1330, datetime.datetime(2012, 6, 8, 0, 0)),
    ('B', '', Decimal('31.0000000000'), 1330, datetime.datetime(2012, 6, 4, 0, 0)),
    ('AA', 'C', Decimal('31.0000000000'), 1330, datetime.datetime(2012, 5, 31, 0, 0)),
    ('B', '', Decimal('7.0000000000'), 1330, datetime.datetime(2012, 5, 24, 0, 0)),
    ('A', '', Decimal('21.0000000000'), 1330, datetime.datetime(2012, 5, 14, 0, 0))
]

I would like to group these by the first, second, fourth and fifth columns in the tuple and sum the 3rd. For this example I'll name the columns as col1, col2, col3, col4, col5.

In SQL I would do something like this:

select col1, col2, sum(col3), col4, col5 from my table
group by col1, col2, col4, col5

Is there a "cool" way to do this or is it all a manual loop?

You want itertools.groupby .

Note that groupby expects the input to be sorted, so you may need to do that before hand:

keyfunc = lambda t: (t[0], t[1], t[3], t[4])
data.sort(key=keyfunc)
for key, rows in itertools.groupby(data, keyfunc):
    print key, sum(r[2] for r in rows)
>>> [(x[0:2] + (sum(z[2] for z in y),) + x[2:5]) for (x, y) in
      itertools.groupby(sorted(L, key=operator.itemgetter(0, 1, 3, 4)),
      key=operator.itemgetter(0, 1, 3, 4))]
[
  ('A', '', Decimal('21.0000000000'), 1330, datetime.datetime(2012, 5, 14, 0, 0)),
  ('A', '', Decimal('4.0000000000'), 1330, datetime.datetime(2012, 6, 8, 0, 0)),
  ('AA', 'C', Decimal('31.0000000000'), 1330, datetime.datetime(2012, 5, 31, 0, 0)),
  ('B', '', Decimal('7.0000000000'), 1330, datetime.datetime(2012, 5, 24, 0, 0)),
  ('B', '', Decimal('31.0000000000'), 1330, datetime.datetime(2012, 6, 4, 0, 0))
]

(NOTE: output reformatted)

如果您发现自己对大型数据集执行了大量操作,则可能需要看一下pandas库,该库具有许多用于执行此类操作的好工具。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM