简体   繁体   English

如何使用列表快速更新大型词典?

[英]How to update a large dictionary using a list quickly?

I am looking for a fast way to update the values in a (ordered) dictionary, which contains tens of millions of values, where the updated values are stored in a list/array. 我正在寻找一种更新(有序)字典中的值的快速方法,该字典包含数千万个值,更新后的值存储在列表/数组中。

The program I am writing takes the list of keys from the original dictionary (which are numerical tuples) as a numpy array, and passes them through a function which returns an array of new numbers (one for each key value). 我正在编写的程序将原始字典(数字元组)中的键列表作为numpy数组,并将它们传递给一个函数,该函数返回一个新数字数组(每个键值一个)。 This array is then multiplied with the corresponding dictionary values (through piece-wise array multiplication), and it is this returned 1-D array of values that we wish to use to update the dictionary. 然后将此数组与相应的字典值相乘(通过分段数组乘法),而这就是我们希望用来更新字典的值的返回的一维数组。 The entries in the new array are stored in the order of the corresponding keys, so I could use a loop to go through the dictionary a update the values one-by-one. 新数组中的条目按相应键的顺序存储,因此我可以使用循环遍历字典并逐个更新值。 But this is too inefficient. 但这太低效了。 Is there a faster way in which to update the values in this dictionary which doesn't use loops? 有没有更快速的方法来更新此不使用循环的字典中的值?

An example of a similar problem would be if the keys in a dictionary represent the x and y-coordinates of points in space, and the values represent the forces being applied at that point. 一个类似问题的示例是,如果字典中的键表示空间中点的x和y坐标,并且值表示在该点处施加的力。 If we want to calculate the torque experienced at each point from the origin, we would first need a function like: 如果要计算从原点到每个点的扭矩,则首先需要一个类似以下的函数:

def euclid(xy):
   return (xy[0]**2 + xy[1]**2)**0.5

Which, if xy represents the x, y-tuple, would return the Euclidean distance from the origin. 如果xy代表x,y元组,则它将返回距原点的欧几里得距离。 We could then multiply this by the corresponding dictionary value to return the torque, like so: 然后,我们可以将其乘以相应的字典值以返回扭矩,如下所示:

for xy in dict.keys():
   dict[xy] = euclid(xy)*dict[xy]

But this loop is slow, and we could take advantage of array algebra to get the new values in one operation: 但是此循环很慢,我们可以利用数组代数在一个操作中获取新值:

new_dict_values = euclid(np.array(dict.keys()))*np.array(dict.values())

And it is here that we wish to find a fast method to update the dictionary, instead of utilising: 我们希望在这里找到一种快速的方法来更新字典,而不是利用:

i = 0
for key in dict.keys():
    dict[key] = new_dict_value[i]
    i += 1

That last piece of code isn't just slow. 最后一段代码不仅很慢。 I don't think it does what you want it to do: 我不认为它可以做您想要做的事情:

for key in dict.keys():
    for i in range(len(new_dict_values)):
        dict[key] = new_dict_value[i]

For every key in the dictionary, you are iterating through the entire list of new_dict_values and assigning each one to the value of that key, overwriting the value you assigned in the previous iteration of the loop. 对于字典中的每个键,您将遍历new_dict_values的整个列表,并将每个键分配给该键的值,从而覆盖您在循环的上一次迭代中分配的值。 This will give you a dictionary where every key has the value of the last element in new_dict_value, which I don't think is what you want. 这将为您提供一个字典,其中每个键都具有new_dict_value中最后一个元素的值,我认为这不是您想要的。

If you are certain that the order of the keys in the dictionary is the same as the order of the values in new_dict_values, then you can do this: 如果确定字典中键的顺序与new_dict_values中值的顺序相同,则可以执行以下操作:

for key, value in zip(dict.keys(), new_dict_values):
    dict[key] = value

Edit: Also, in the future there is no need in python to iterate through a range of numbers and access elements of a list via the index. 编辑:另外,将来在python中无需遍历一系列数字和通过索引访问列表的元素。 This: 这个:

for i in range(len(new_dict_values)):
        dict[key] = new_dict_value[i]

is equivalent to this: 等效于此:

for i in new_dict_values:
        dict[key] = i

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM