Split a list of tuples into sub-lists of the same tuple field

Question

I have a huge list of tuples in this format. The second field of the each tuple is the category field.

    [(1, 'A', 'foo'),
    (2, 'A', 'bar'),
    (100, 'A', 'foo-bar'),

    ('xx', 'B', 'foobar'),
    ('yy', 'B', 'foo'),

    (1000, 'C', 'py'),
    (200, 'C', 'foo'),
    ..]

What is the most efficient way to break it down into sub-lists of the same category ( A, B, C .,etc)?

Answer 1

Use itertools.groupby :

import itertools
import operator

data=[(1, 'A', 'foo'),
    (2, 'A', 'bar'),
    (100, 'A', 'foo-bar'),

    ('xx', 'B', 'foobar'),
    ('yy', 'B', 'foo'),

    (1000, 'C', 'py'),
    (200, 'C', 'foo'),
    ]

for key,group in itertools.groupby(data,operator.itemgetter(1)):
    print(list(group))

yields

[(1, 'A', 'foo'), (2, 'A', 'bar'), (100, 'A', 'foo-bar')]
[('xx', 'B', 'foobar'), ('yy', 'B', 'foo')]
[(1000, 'C', 'py'), (200, 'C', 'foo')]

Or, to create one list with each group as a sublist, you could use a list comprehension:

[list(group) for key,group in itertools.groupby(data,operator.itemgetter(1))]

The second argument to itertools.groupby is a function which itertools.groupby applies to each item in data (the first argument). It is expected to return a key . itertools.groupby then groups together all contiguous items with the same key .

operator.itemgetter(1) picks off the second item in a sequence.

For example, if

row=(1, 'A', 'foo')

then

operator.itemgetter(1)(row)

equals 'A' .

As @eryksun points out in the comments, if the categories of the tuples appear in some random order, then you must sort data first before applying itertools.groupby . This is because itertools.groupy only collects contiguous items with the same key into groups.

To sort the tuples by category, use:

data2=sorted(data,key=operator.itemgetter(1))

Answer 2

`collections.defaultdict`

itertools.groupby requires the input to be sorted by the key field, otherwise you will have to sort first , incurring O( n log n ) cost. For guaranteed O( n ) time complexity, you can use a defaultdict of lists:

from collections import defaultdict

dd = defaultdict(list)
for item in data:
    dd[item[1]].append(item)

res = list(dd.values())

print(res)

[[(1, 'A', 'foo'), (2, 'A', 'bar'), (100, 'A', 'foo-bar')],
 [('xx', 'B', 'foobar'), ('yy', 'B', 'foo')],
 [(1000, 'C', 'py'), (200, 'C', 'foo')]]

Answer 3

To get multiple lists of singletons from a list of tuples:

foo = ((1,2), (3, 4), (5, 6), (7,8) , (9, 10))
[[z[i] for z in foo] for i in (0,1)]

If you prefer to get multiple tuples of singletons:

zip(*[(1,4),(2,5),(3,6)])

Split a list of tuples into sub-lists of the same tuple field

Question

3 answers

solution1
23 ACCPTED 2011-11-11 10:52:21

solution2
2 2019-01-24 16:54:31

`collections.defaultdict`

solution3
0 2012-09-08 01:16:51

Split a list of tuples into sub-lists of the same tuple field

Question

3 answers

solution1 23 ACCPTED 2011-11-11 10:52:21

solution2 2 2019-01-24 16:54:31

collections.defaultdict

solution3 0 2012-09-08 01:16:51

solution1
23 ACCPTED 2011-11-11 10:52:21

solution2
2 2019-01-24 16:54:31

`collections.defaultdict`

solution3
0 2012-09-08 01:16:51