What is the faster way to count occurrences of equal sublists in a nested list?

Question

I have a list of lists in Python and I want to (as fastly as possible : very important...) append to each sublist the number of time it appear into the nested list.

I have done that with some pandas data-frame, but this seems to be very slow and I need to run this lines on very very large scale. I am completely willing to sacrifice nice-reading code to efficient one.

So for instance my nested list is here:

l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]

I need to have:

res = [[1, 3, 2, 2], [1, 3, 5, 1]]

EDIT

Order in res does not matter at all.

Answer 1

If order does not matter you could use collections.Counter with extended iterable unpacking , as a variant of @Chris_Rands solution:

from collections import Counter

l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]

result = [[*t, count] for t, count in Counter(map(tuple, l)).items()]
print(result)

Output

[[1, 3, 5, 1], [1, 3, 2, 2]]

Answer 2

This is quite an odd output to want but it is of course possible. I suggest using collections.Counter() , no doubt others will make different suggestions and a timeit style comparison would reveal the fastest of course for particular data sets:

>>> from collections import Counter
>>> l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]
>>> [list(k) + [v] for k, v in Counter(map(tuple,l)).items()]
[[1, 3, 2, 2], [1, 3, 5, 1]]

Note to preserve the insertion order prior to CPython 3.6 / Python 3.7, use the OrderedCounter recipe .

Answer 3

If numpy is an option, you could use np.unique setting axis to 0 and return_counts to True , and concatenate the unique rows and counts using np.vstack :

l = np.array([[1, 3, 2], [1, 3, 2] ,[1, 3, 5]])
x, c = np.unique(l, axis=0, return_counts=True)
np.vstack([x.T,c]).T

array([[1, 3, 2, 2],
       [1, 3, 5, 1]])

Answer 4

Since your items are mutable objects and you have to convert them to an immutable object to be used as a mapping key, an optimized approach is to use defaultdict() as following:

In [5]: from collections import defaultdict

In [6]: d = defaultdict(int)

In [7]: for sub in l:
   ...:     d[tuple(sub)] += 1
   ...:     

In [8]: d
Out[8]: defaultdict(int, {(1, 3, 2): 2, (1, 3, 5): 1})

This will give you a dictionary of your sub-lists as the key and their counts as the value.

Another way is to create your own dictionary object:

 In [9]: class customdict(dict):
    ...:        
    ...:     def __getitem__(self, key):
    ...:         try:
    ...:             val = super(customdict, self).__getitem__(key)
    ...:         except KeyError:
    ...:             self[key] = [*key, 0]
    ...:         else:
    ...:             val[-1] += 1
    ...:             self[key] = val
    ...:             return val
    ...:         
    ...:    

 In [10]: m = customdict()

 In [11]: for sub in l:
     ...:     m[tuple(sub)]
     ...:     

 In [12]: 

 In [12]: m
 Out[12]: {(1, 3, 2): [1, 3, 2, 2], (1, 3, 5): [1, 3, 5, 1]}

 In [13]: m.values()
 Out[13]: dict_values([[1, 3, 2, 2], [1, 3, 5, 1]])

What is the faster way to count occurrences of equal sublists in a nested list?

Question

4 answers

solution1
10 ACCPTED 2019-01-25 10:51:06

solution2
8 2019-01-25 10:50:12

solution3
1 2019-01-25 10:53:16

solution4
0 2019-01-25 13:24:44

What is the faster way to count occurrences of equal sublists in a nested list?

Question

4 answers

solution1 10 ACCPTED 2019-01-25 10:51:06

solution2 8 2019-01-25 10:50:12

solution3 1 2019-01-25 10:53:16

solution4 0 2019-01-25 13:24:44

solution1
10 ACCPTED 2019-01-25 10:51:06

solution2
8 2019-01-25 10:50:12

solution3
1 2019-01-25 10:53:16

solution4
0 2019-01-25 13:24:44