I have a huge list of tuples in this format. The second field of the each tuple is the category field.
[(1, 'A', 'foo'),
(2, 'A', 'bar'),
(100, 'A', 'foo-bar'),
('xx', 'B', 'foobar'),
('yy', 'B', 'foo'),
(1000, 'C', 'py'),
(200, 'C', 'foo'),
..]
What is the most efficient way to break it down into sub-lists of the same category ( A, B, C .,etc)?
Use itertools.groupby :
import itertools
import operator
data=[(1, 'A', 'foo'),
(2, 'A', 'bar'),
(100, 'A', 'foo-bar'),
('xx', 'B', 'foobar'),
('yy', 'B', 'foo'),
(1000, 'C', 'py'),
(200, 'C', 'foo'),
]
for key,group in itertools.groupby(data,operator.itemgetter(1)):
print(list(group))
yields
[(1, 'A', 'foo'), (2, 'A', 'bar'), (100, 'A', 'foo-bar')]
[('xx', 'B', 'foobar'), ('yy', 'B', 'foo')]
[(1000, 'C', 'py'), (200, 'C', 'foo')]
Or, to create one list with each group as a sublist, you could use a list comprehension:
[list(group) for key,group in itertools.groupby(data,operator.itemgetter(1))]
The second argument to itertools.groupby
is a function which itertools.groupby
applies to each item in data
(the first argument). It is expected to return a key
. itertools.groupby
then groups together all contiguous items with the same key
.
operator.itemgetter(1) picks off the second item in a sequence.
For example, if
row=(1, 'A', 'foo')
then
operator.itemgetter(1)(row)
equals 'A'
.
As @eryksun points out in the comments, if the categories of the tuples appear in some random order, then you must sort data
first before applying itertools.groupby
. This is because itertools.groupy
only collects contiguous items with the same key into groups.
To sort the tuples by category, use:
data2=sorted(data,key=operator.itemgetter(1))
collections.defaultdict
itertools.groupby
requires the input to be sorted by the key field, otherwise you will have to sort first , incurring O( n log n ) cost. For guaranteed O( n ) time complexity, you can use a defaultdict
of lists:
from collections import defaultdict
dd = defaultdict(list)
for item in data:
dd[item[1]].append(item)
res = list(dd.values())
print(res)
[[(1, 'A', 'foo'), (2, 'A', 'bar'), (100, 'A', 'foo-bar')],
[('xx', 'B', 'foobar'), ('yy', 'B', 'foo')],
[(1000, 'C', 'py'), (200, 'C', 'foo')]]
To get multiple lists of singletons from a list of tuples:
foo = ((1,2), (3, 4), (5, 6), (7,8) , (9, 10))
[[z[i] for z in foo] for i in (0,1)]
If you prefer to get multiple tuples of singletons:
zip(*[(1,4),(2,5),(3,6)])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.