I have a list of lists in Python and I want to (as fastly as possible : very important...) append to each sublist the number of time it appear into the nested list.
I have done that with some pandas
data-frame, but this seems to be very slow and I need to run this lines on very very large scale. I am completely willing to sacrifice nice-reading code to efficient one.
So for instance my nested list is here:
l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]
I need to have:
res = [[1, 3, 2, 2], [1, 3, 5, 1]]
EDIT
Order in res
does not matter at all.
If order does not matter you could use collections.Counter with extended iterable unpacking , as a variant of @Chris_Rands solution:
from collections import Counter
l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]
result = [[*t, count] for t, count in Counter(map(tuple, l)).items()]
print(result)
Output
[[1, 3, 5, 1], [1, 3, 2, 2]]
This is quite an odd output to want but it is of course possible. I suggest using collections.Counter()
, no doubt others will make different suggestions and a timeit
style comparison would reveal the fastest of course for particular data sets:
>>> from collections import Counter
>>> l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]
>>> [list(k) + [v] for k, v in Counter(map(tuple,l)).items()]
[[1, 3, 2, 2], [1, 3, 5, 1]]
Note to preserve the insertion order prior to CPython 3.6 / Python 3.7, use the OrderedCounter
recipe .
If numpy
is an option, you could use np.unique
setting axis to 0
and return_counts
to True
, and concatenate the unique rows and counts using np.vstack
:
l = np.array([[1, 3, 2], [1, 3, 2] ,[1, 3, 5]])
x, c = np.unique(l, axis=0, return_counts=True)
np.vstack([x.T,c]).T
array([[1, 3, 2, 2],
[1, 3, 5, 1]])
Since your items are mutable objects and you have to convert them to an immutable object to be used as a mapping key, an optimized approach is to use defaultdict()
as following:
In [5]: from collections import defaultdict
In [6]: d = defaultdict(int)
In [7]: for sub in l:
...: d[tuple(sub)] += 1
...:
In [8]: d
Out[8]: defaultdict(int, {(1, 3, 2): 2, (1, 3, 5): 1})
This will give you a dictionary of your sub-lists as the key and their counts as the value.
Another way is to create your own dictionary object:
In [9]: class customdict(dict):
...:
...: def __getitem__(self, key):
...: try:
...: val = super(customdict, self).__getitem__(key)
...: except KeyError:
...: self[key] = [*key, 0]
...: else:
...: val[-1] += 1
...: self[key] = val
...: return val
...:
...:
In [10]: m = customdict()
In [11]: for sub in l:
...: m[tuple(sub)]
...:
In [12]:
In [12]: m
Out[12]: {(1, 3, 2): [1, 3, 2, 2], (1, 3, 5): [1, 3, 5, 1]}
In [13]: m.values()
Out[13]: dict_values([[1, 3, 2, 2], [1, 3, 5, 1]])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.