简体   繁体   English

在python中进行对应替换操作的更快方法?

[英]Faster way to do a correspondence replace operation in python?

I am not sure I'm using the right term for this---I'd call this a merge operation maybe? 我不确定我是否为此使用了正确的术语-也许我会称其为合并操作? Simple matching? 简单匹配?

I have two dictionaries. 我有两个字典。 One of them contains a list of tag IDs. 其中一个包含一个标签ID列表。 The other one is a correspondence between tag IDs and tag ID names. 另一个是标签ID和标签ID名称之间的对应关系。 I want to match the IDs and include the tag names in the first dict. 我想匹配ID,并在第一个字典中包含标签名称。

So, first dictionary looks like this: 因此,第一个字典如下所示:

>>> myjson
[
{"tags" : ["1","3"],"otherdata" : "blah"},
{"tags" : ["2","4"],"otherdata" : "blah blah"}
]

Second dictionary looks like this: 第二本字典如下所示:

>>> tagnames
[
{"id": "1", "name":"bassoon"},
{"id": "2", "name":"banjo"},
{"id": "3", "name":"paw paw"},
{"id": "4", "name":"foxes"}
]

To replace the tag IDs in myjson with the tag ID names, I am currently doing this: 要将myjson中的标签ID替换为标签ID名称,我目前正在这样做:

data = []
for j in myjson:
    d = j
    d['tagnames'] = [i['name'] for i in tagnames for y in d['tags'] if y==i['id']]
    data.append(d)

My desired output is this: 我想要的输出是这样的:

>>> data
[
{"tags" : ["1","3"],"otherdata" : "blah", "tagname" : ["bassoon","paw paw"]},
{"tags" : ["2","4"],"otherdata" : "blah blah", "tagname": ["banjo","foxes"]}
]

I'm getting the right output, but it seems really slow. 我得到了正确的输出,但是它看起来真的很慢。 I get that it's doing full iterations of each element in myjson x full iterations of each element in tagnames (is that mxn? nxn?) every time and that that will be slow, but maybe there is a smarter syntax or tricks for speeding it up? 我知道它每次都会对myjson中的每个元素进行完整的迭代x每次对标记名中的每个元素进行完整的迭代(是mxn?nxn?),这会很慢,但是也许有更聪明的语法或技巧可以加快它的运行速度? Walk the array just once instead of n times? 遍历数组一次而不是n次?

Oooh, also, would be cool if someone could suggest a way to do this assignment with a slick map or functional approach rather than the outer forloop. 如果有人可以提出一种通过光滑的图或功能性方法而不是外部forloop来进行此分配的方法,那也很不错。

You want to transform your tagnames list into a dictionary: 您想将标记名列表转换成字典:

tagnames_map = {t['id']: t['name'] for t in tagnames}

Now you can find matching tagnames much faster; 现在您可以更快地找到匹配的标记名。 your code already just made in-place changes, so I'll simplify it to: 您的代码已经进行了就地更改,因此我将其简化为:

for d in myjson:
    d['tagnames'] = [tagnames_map[t] for t in tagnames_map.viewkeys() & d['tags']]

The dict.viewkeys() method returns a dictionary view object which acts like a set. dict.viewkeys()方法返回一个类似于集合的字典视图对象 We intersect that set against your list of tags, resulting in a sequence of tags that are all listed in tagnames_map . 我们将其与您的标签列表相交,生成一系列标签,这些标签全部列在tagnames_map By doing this we don't have to worry about any tags that are missing from the map. 这样,我们就不必担心地图上缺少任何标签。

If you are using Python 3, then you just use tagnames_map.keys() directly; 如果您使用的是Python 3,则只需直接使用tagnames_map.keys()即可; in Python 3 the .keys() , .values() and items() methods have been changed to always return dictionary view objects. 在Python 3中, .keys() .values()items()方法已更改为始终返回字典视图对象。

If you wanted to make a copy instead, do so using d.copy() : 如果您想制作副本,请使用d.copy()

data = []
for d in myjson:
    d = d.copy()
    d['tagnames'] = [tagnames_map[t] for t in tagnames_map.viewkeys() & d['tags']]
    data.append(d)

dict.copy() creates a shallow copy; dict.copy()创建一个浅表副本; mutable values are not copied, the new dict will just reference the same values. 可变值不会被复制,新字典将仅引用相同的值。 As you are not altering values here that is fine. 因为您不更改此处的值,所以很好。

Running this against your sample input gives: 针对您的示例输入运行此命令可获得:

>>> pprint(data)
[{'otherdata': 'blah', 'tagnames': ['bassoon', 'paw paw'], 'tags': ['1', '3']},
 {'otherdata': 'blah blah',
  'tagnames': ['banjo', 'foxes'],
  'tags': ['2', '4']}]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM