简体   繁体   中英

What is the faster way to count occurrences of equal sublists in a nested list?

I have a list of lists in Python and I want to (as fastly as possible : very important...) append to each sublist the number of time it appear into the nested list.

I have done that with some pandas data-frame, but this seems to be very slow and I need to run this lines on very very large scale. I am completely willing to sacrifice nice-reading code to efficient one.

So for instance my nested list is here:

l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]

I need to have:

res = [[1, 3, 2, 2], [1, 3, 5, 1]]

EDIT

Order in res does not matter at all.

If order does not matter you could use collections.Counter with extended iterable unpacking , as a variant of @Chris_Rands solution:

from collections import Counter

l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]

result = [[*t, count] for t, count in Counter(map(tuple, l)).items()]
print(result)

Output

[[1, 3, 5, 1], [1, 3, 2, 2]]

This is quite an odd output to want but it is of course possible. I suggest using collections.Counter() , no doubt others will make different suggestions and a timeit style comparison would reveal the fastest of course for particular data sets:

>>> from collections import Counter
>>> l = [[1, 3, 2], [1, 3, 2] ,[1, 3, 5]]
>>> [list(k) + [v] for k, v in Counter(map(tuple,l)).items()]
[[1, 3, 2, 2], [1, 3, 5, 1]]

Note to preserve the insertion order prior to CPython 3.6 / Python 3.7, use the OrderedCounter recipe .

If numpy is an option, you could use np.unique setting axis to 0 and return_counts to True , and concatenate the unique rows and counts using np.vstack :

l = np.array([[1, 3, 2], [1, 3, 2] ,[1, 3, 5]])
x, c = np.unique(l, axis=0, return_counts=True)
np.vstack([x.T,c]).T

array([[1, 3, 2, 2],
       [1, 3, 5, 1]])

Since your items are mutable objects and you have to convert them to an immutable object to be used as a mapping key, an optimized approach is to use defaultdict() as following:

In [5]: from collections import defaultdict

In [6]: d = defaultdict(int)

In [7]: for sub in l:
   ...:     d[tuple(sub)] += 1
   ...:     

In [8]: d
Out[8]: defaultdict(int, {(1, 3, 2): 2, (1, 3, 5): 1})

This will give you a dictionary of your sub-lists as the key and their counts as the value.

Another way is to create your own dictionary object:

 In [9]: class customdict(dict):
    ...:        
    ...:     def __getitem__(self, key):
    ...:         try:
    ...:             val = super(customdict, self).__getitem__(key)
    ...:         except KeyError:
    ...:             self[key] = [*key, 0]
    ...:         else:
    ...:             val[-1] += 1
    ...:             self[key] = val
    ...:             return val
    ...:         
    ...:    

 In [10]: m = customdict()

 In [11]: for sub in l:
     ...:     m[tuple(sub)]
     ...:     

 In [12]: 

 In [12]: m
 Out[12]: {(1, 3, 2): [1, 3, 2, 2], (1, 3, 5): [1, 3, 5, 1]}

 In [13]: m.values()
 Out[13]: dict_values([[1, 3, 2, 2], [1, 3, 5, 1]])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM