简体   繁体   中英

How to optimize this Python code?

def maxVote(nLabels):
    count = {}
    maxList = []
    maxCount = 0
    for nLabel in nLabels:
        if nLabel in count:
            count[nLabel] += 1
        else:
            count[nLabel] = 1
    #Check if the count is max
        if count[nLabel] > maxCount:
            maxCount = count[nLabel]
            maxList = [nLabel,]
        elif count[nLabel]==maxCount:
            maxList.append(nLabel)
    return random.choice(maxList) 

nLabels contains a list of integers.

The above function returns the integer with highest frequency, if more than one have same frequency then a randomly selected integer from them is returned.

Eg maxVote([1,3,4,5,5,5,3,12,11]) is 5

import random
import collections

def maxvote(nlabels):
  cnt = collections.defaultdict(int)
  for i in nlabels:
    cnt[i] += 1
  maxv = max(cnt.itervalues())
  return random.choice([k for k,v in cnt.iteritems() if v == maxv])

print maxvote([1,3,4,5,5,5,3,3,11])

In Python 3.1 or future 2.7 you'd be able to use Counter :

>>> from collections import Counter
>>> Counter([1,3,4,5,5,5,3,12,11]).most_common(1)
[(5, 3)]

If you don't have access to those versions of Python you could do:

>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> for i in nLabels:
    d[i] += 1


>>> max(d, key=lambda x: d[x])
5

It appears to run in O(n) time. However there may be a bottleneck in checking if nLabel in count since this operation could also potentially run O(n) time as well, making the total efficiency O(n^2).

Using a dictionary instead of a list in this case is the only major efficiency boost I can spot.

I'm not sure what exactly you want to optimize, but this should work:

from collections import defaultdict

def maxVote(nLabels):
   count = defaultdict(int)
   for nLabel in nLabels:
      count[nLabel] += 1
   maxCount = max(count.itervalues())
   maxList = [k for k in count if count[k] == maxCount]
   return random.choice(maxList)

Idea 1

Does the return really need to be random, or can you just return a maximum? If you just need to nondeterministically return a max frequency, you could just store a single label and remove the list logic, including

 elif count[nLabel]==maxCount:
        maxList.append(nLabel)

Idea 2

If this method is called frequently, would it be possible to only work on new data, as opposed to the entire data set? You could cache your count map and then only process new data. Assuming your data set is large and the calculations are done online , this could net huge improvements.

Complete example:

#!/usr/bin/env python

def max_vote(l):
    """
    Return the element with the (or a) maximum frequency in ``l``.
    """
    unsorted = [(a, l.count(a)) for a in set(l)]
    return sorted(unsorted, key=lambda x: x[1]).pop()[0]

if __name__ == '__main__':
    votes = [1, 3, 4, 5, 5, 5, 3, 12, 11]
    print max_vote(votes)
    # => 5

Benchmarks:

#!/usr/bin/env python

import random
import collections

def max_vote_2(l):
    """
    Return the element with the (or a) maximum frequency in ``l``.
    """
    unsorted = [(a, l.count(a)) for a in set(l)]
    return sorted(unsorted, key=lambda x: x[1]).pop()[0]

def max_vote_1(nlabels):
    cnt = collections.defaultdict(int)
    for i in nlabels:
        cnt[i] += 1
        maxv = max(cnt.itervalues())
    return random.choice([k for k,v in cnt.iteritems() if v == maxv])

if __name__ == '__main__':
    from timeit import Timer
    votes = [1, 3, 4, 5, 5, 5, 3, 12, 11]
    print max_vote_1(votes)
    print max_vote_2(votes)

    t = Timer("votes = [1, 3, 4, 5, 5, 5, 3, 12, 11]; max_vote_2(votes)", \
        "from __main__ import max_vote_2")
    print 'max_vote_2', t.timeit(number=100000)

    t = Timer("votes = [1, 3, 4, 5, 5, 5, 3, 12, 11]; max_vote_1(votes)", \
        "from __main__ import max_vote_1")
    print 'max_vote_1', t.timeit(number=100000)

Yields:

5
5
max_vote_2 1.79455208778
max_vote_1 2.31705093384

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM