Faster way to do a correspondence replace operation in python?

Question

I am not sure I'm using the right term for this---I'd call this a merge operation maybe? Simple matching?

I have two dictionaries. One of them contains a list of tag IDs. The other one is a correspondence between tag IDs and tag ID names. I want to match the IDs and include the tag names in the first dict.

So, first dictionary looks like this:

>>> myjson
[
{"tags" : ["1","3"],"otherdata" : "blah"},
{"tags" : ["2","4"],"otherdata" : "blah blah"}
]

Second dictionary looks like this:

>>> tagnames
[
{"id": "1", "name":"bassoon"},
{"id": "2", "name":"banjo"},
{"id": "3", "name":"paw paw"},
{"id": "4", "name":"foxes"}
]

To replace the tag IDs in myjson with the tag ID names, I am currently doing this:

data = []
for j in myjson:
    d = j
    d['tagnames'] = [i['name'] for i in tagnames for y in d['tags'] if y==i['id']]
    data.append(d)

My desired output is this:

>>> data
[
{"tags" : ["1","3"],"otherdata" : "blah", "tagname" : ["bassoon","paw paw"]},
{"tags" : ["2","4"],"otherdata" : "blah blah", "tagname": ["banjo","foxes"]}
]

I'm getting the right output, but it seems really slow. I get that it's doing full iterations of each element in myjson x full iterations of each element in tagnames (is that mxn? nxn?) every time and that that will be slow, but maybe there is a smarter syntax or tricks for speeding it up? Walk the array just once instead of n times?

Oooh, also, would be cool if someone could suggest a way to do this assignment with a slick map or functional approach rather than the outer forloop.

Answer 1

You want to transform your tagnames list into a dictionary:

tagnames_map = {t['id']: t['name'] for t in tagnames}

Now you can find matching tagnames much faster; your code already just made in-place changes, so I'll simplify it to:

for d in myjson:
    d['tagnames'] = [tagnames_map[t] for t in tagnames_map.viewkeys() & d['tags']]

The dict.viewkeys() method returns a dictionary view object which acts like a set. We intersect that set against your list of tags, resulting in a sequence of tags that are all listed in tagnames_map . By doing this we don't have to worry about any tags that are missing from the map.

If you are using Python 3, then you just use tagnames_map.keys() directly; in Python 3 the .keys() , .values() and items() methods have been changed to always return dictionary view objects.

If you wanted to make a copy instead, do so using d.copy() :

data = []
for d in myjson:
    d = d.copy()
    d['tagnames'] = [tagnames_map[t] for t in tagnames_map.viewkeys() & d['tags']]
    data.append(d)

dict.copy() creates a shallow copy; mutable values are not copied, the new dict will just reference the same values. As you are not altering values here that is fine.

Running this against your sample input gives:

>>> pprint(data)
[{'otherdata': 'blah', 'tagnames': ['bassoon', 'paw paw'], 'tags': ['1', '3']},
 {'otherdata': 'blah blah',
  'tagnames': ['banjo', 'foxes'],
  'tags': ['2', '4']}]

Faster way to do a correspondence replace operation in python?

Question

1 answers

solution1
2 ACCPTED 2013-05-10 14:19:14

Faster way to do a correspondence replace operation in python?

Question

1 answers

solution1 2 ACCPTED 2013-05-10 14:19:14

solution1
2 ACCPTED 2013-05-10 14:19:14