简体   繁体   中英

What's the most efficient way to perform a multiple match lookup in a python dictionary?

I'm looking to maximally optimize the runtime for this chunk of code:

aDictionary= {"key":["value", "value2", ...

rests = \
         list(map((lambda key: Resp(key=key)),
                     [key for key, values in
                      aDictionary.items() if (test1 in values or test2 in values)]))

using python3. willing to throw as much memory at it as possible.

considering throwing the two dictionary lookups on separate processes for speedup (does that make sense?). any other optimization ideas welcome


  • values can definitely be sorted and turned into a set; it is precomputed, very very large.
  • always len(values) >>>> len(tests), though they're both growing over time
  • len(tests) grows very very slowly, and has new values for each execution
  • currently looking at strings (considering doing a string->integer mapping)

For starters, there is no reason to use map when you are already using a list comprehension, so you can remove that, as well as the outer list call:

rests = [Resp(key=key) for key, values in aDictionary.items()
         if (test1 in values or test2 in values)]

A second possible optimization might be to turn each list of values into a set. That would take up time initially, but it would change your lookups ( in uses) from linear time to constant time. You might need to create a separate helper function for that. Something like:

def anyIn(checking, checkingAgainst):
    checkingAgainst = set(checkingAgainst)
    for val in checking:
        if val in checkingAgainst:
            return True
    return False

Then you could change the end of your list comprehension to read

...if anyIn([test1, test2], values)]

But again, this would probably only be worth it if you had more than two values you were checking, or if the list of values in values is very long.

If tests are sufficiently numerous, it will surely pay off to switch to set operations:

tests = set([test1, test2, ...])
resps = map(Resp, (k for k, values in dic.items() if not tests.isdisjoint(values)))  
# resps this is a lazy iterable, not a list, and it uses a 
# generator inside, thus saving the overhead of building 
# the inner list.

Turning the dict values into sets would not gain anything as the conversion would be O(N) with N being the added size of all values -lists, while the above disjoint operation will only iterate each values until it encounters a testx with O(1) lookup.

map is possibly more performant compared to a comprehension if you do not have to use lambda, eg if key can be used as the first positional argument in Resp 's __init__ , but certainly not with the lambda! ( Python List Comprehension Vs. Map ). Otherwise, a generator or comprehension will be better:

resps = (Resp(key=k) for k, values in dic.items() if not tests.isdisjoint(values))
#resps = [Resp(key=k) for k, values in dic.items() if not tests.isdisjoint(values)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM