I have some dictionaries generated dynamically. They are generated using from collections import defaultdict
and are as follows:
a= defaultdict(list, {'speed_limit': [('0', '70')]})
b= defaultdict(list, {'speed_limit': [('0', '70'),('0', '60'),
('0','50')],'road_obstacles': [('0', '8')]})
What I want
Print nothing if 'a' is in 'b' which is true in the above case. Only print when keys or values inside are different.
In the above case, a
has 1 tuple and b
has 3 tuples, but a
's tuple is a part of b
's one so there should not be any difference.
What I tried
I tried a very conservative approach of nested loops which works but is not efficient. Additionally, it fails to handle the case when structures i am comparing get complex.
This is what i tried and this approach is very inefficient for large structures:
for key,value in a.iteritems():
for key1,value1 in b.iteritems():
if key!=key1:
print "doesn't matches", key,value, key1,value1
if key==key1: #check for values
if value==value1: #if values are same
print "key and value matches", key,value,key1,value1
if value!=value1: #if values not same
print "key matches but value differs", key,value,key1,value1
Currently you're iterating over both dictionaries, essentially generating a Cartesian product. Sounds to me like what you really want is a union.
The union operator is |
. It works on sets. To find the union of all the keys in the two dictionaries, use set(a.keys()) | set(b.keys())
set(a.keys()) | set(b.keys())
Edit: ivan_pozdeev points out that the sets can be calculated faster using set(a) | set(b)
set(a) | set(b)
.
You can then just iterate over that set once, checking key in a
, key in b
, and whether the values have common elements ( set(a_value) & set(b_value)
) as necessary. Here's an example:
all_keys = set(a.keys()) | set(b.keys())
for k in all_keys:
if k in a:
if k in b:
print("Key is in both dictionaries:",k)
a_value,b_value = a[k],b[k]
if set(a_value) & set(b_value)):
print("Values match")
else:
print("Values do not match")
else: print("Key is in a but not b:",k)
else: print("Key is in b but not a:",k)
That's just one way of doing it. Another way would be to calculate three sets: set(a.keys()) - set(b.keys())
for keys in a
but not in b
, set(b.keys()) - set(a.keys())
for keys in b
but not in a
, and set(a.keys()) & set(b.keys())
for the keys that are in both dictionaries.
For set operations, the fastest approach should be to convert the dict
s to set
s and use those from set
implementation (as they are implemented in C):
>>> [
set((k,iv) for k,v in var.iteritems() for iv in v)
for var in a,b]
[{('speed_limit', ('0', '70'))},
{('road_obstacles', ('0', '8'))
('speed_limit', ('0', '50')),
('speed_limit', ('0', '60')),
('speed_limit', ('0', '70'))}]
>>> sa,sb=_
>>> sb > sa
True
>>> sb - sa
{('road_obstacles', ('0', '8')),
('speed_limit', ('0', '50')),
('speed_limit', ('0', '60'))}
>>> sa - sb
set()
Cons: an initial step with a loop, although optimization-friendly (maybe you're better off storing them as set
s from the start? The decision depends on how often you need to perform set
-friendly vs dict
-friendly operations)
A timeit
test on the variables in your example shows the times are roughly the same:
def loops(): <your code> def sets(): <my code up to ba> <a and b are being taken from the interactive namespace> In [85]: timeit sets The slowest run took 14.36 times longer than the fastest. This could mean that a n intermediate result is being cached 1000000 loops, best of 3: 253 ns per loop In [86]: timeit loops The slowest run took 13.23 times longer than the fastest. This could mean that a n intermediate result is being cached 1000000 loops, best of 3: 253 ns per loop
A timeit
test on a randomized example with ~1000 elements shows that my code appears to start outperforming yours but the discrepancy is high:
In [64]: alphabet='abcdefghijklmnopqrstuvwxyz_' In [41]: gen_word=lambda:''.join(random.choice(alphabet) for i in range(random.randrange(0,15))) In [66]: a={gen_word():[tuple(random.randrange(100) for _ in range(2)) for _ in range(random.randrange(10))] for _ in range(1000)} In [67]: b=a.copy() In [69]: b.update({gen_word():[tuple(random.randrange(100) for _ in range(2)) for _ in range(random.randrange(10))] for _ in range(500)}) In [74]: a.update({gen_word():[tuple(random.randrange(100) for _ in range(2)) for _ in range(random.randrange(10))] for _ in range(200)}) In [70]: len(b) Out[70]: 1336 In [75]: len(a) Out[75]: 1067 In [76]: timeit loops The slowest run took 9.94 times longer than the fastest. This could mean that an intermediate result is being cached 1000000 loops, best of 3: 337 ns per loop In [77]: timeit sets The slowest run took 14.43 times longer than the fastest. This could mean that a n intermediate result is being cached 1000000 loops, best of 3: 252 ns per loop
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.