简体   繁体   中英

Can sum() be used like a hash

I want to check whether the sum of two list is same or not.

lst1 = [1 2 3 4]
lst2 = [0 3 3 4]
if sum(lst1) == sum(lst2):
    return true

Here the sum returns true. If I hash the list I get different values but hashing is computationally expensive. I only want to check whether the elements in the two lists are same (only with the exception of 1 element) . I am using Binary search technique of dividing the list and then checking the hash. if hashes are different I am checking if its more than once. But as I said hashing is computationally expensive. Also the order does matter here . Thank you

First, is is not the logically correct syntax for your application; use == instead. Refer this post to know more about the difference: Is there a difference between `==` and `is` in Python?

def checksum(lst1, lst2):
    return sum(lst1) == sum(lst2):

list1 = [1, 2, 3, 4]
list2 = [0, 3, 3, 4]
checksum(list1,list2)

This is the correct code to compare a checksum (and is obviously true in this example).

If you are looking for something different, like finding whether the items of the list are the same element-wise, then the straight-forward answer would be

return lst1 == lst2

is checks for object identity, not equality ( link ).

sum does not differentiate between the elements that are summed over only the end result. If you need to differentiate between the elements of the containers you can use a set

set(lst1) == set(lst2) # ignores counts of elements

or if you're sensitive to the counts of the elements use collections.Counter

collections.Counter(lst2) == collections.Counter(lst1) # False
collections.Counter(lst1) == collections.Counter(lst1) # True

this requires you to iterate over the list.


Depending on what exactly you're after you could also you

hash(tuple(lst1)) == hash(tuple(lst2))

This checks both the elements contained and their order (maybe you don't care about order) and is pretty efficient. You have to do the cast to a tuple as list is not immutable and therefore can not be hashed.

hash(tuple) seems to be much faster, but it's not clear to me what exactly your end goal is

%timeit hash(tuple(lst1)) == hash(tuple(lst2))
1000000 loops, best of 3: 408 ns per loop

%timeit collections.Counter(lst2) == collections.Counter(lst1)
100000 loops, best of 3: 5.58 µs per loop

Can sum be used like a hash?

Yes, the sum() function can be used as a hash for a list or tuple of integers.

Is hashing is computationally expensive?

Interestingly, hash() performs better than sum() for a tuple of integers :

$ python2.7 -m timeit -s "data = (1, 2, 3, 4)" "sum(data)"
10000000 loops, best of 3: 0.113 usec per loop

$ python2.7 -m timeit -s "data = (1, 2, 3, 4)" "hash(data)"
10000000 loops, best of 3: 0.0569 usec per loop

How to compare unordered lists

The stated problem is "how to compare elements in the two lists are same (only with the exception of 1 element)"

A built-in tool for doing this is collections.Counter() . Internally, it uses fast hashing to accumulate counts. In Python 3, it is accelerated by code written in C. It uses a single O(n) pass rather than an O(log n) binary search.

>>> from collections import Counter
>>> list1 = [1, 2, 3, 4]
>>> list2 = [0, 3, 3, 4]

>>> c1 = Counter(list1)
>>> c2 = Counter(list2)
>>> c1 == c2
False
>>> c1 - c2
Counter({1: 1, 2: 1})
>>> c2 - c1
Counter({0: 1, 3: 1})

The difference in the various approaches will become more pronounced as the list size grows larger.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM