简体   繁体   English

查找列表中的重复元素并平均对应元素

[英]Find repeated elements in list and average corresponding elements

I have a list of the following type in my code:我的代码中有以下类型的列表:

nodes = [[n1, [x1,y1,z1], [a1,b1,...]], [n2, [x2,y2,z2], [a2,b2,..]], ...]

The elements n identify a node with a certain coordinate [x,y,z], and these can be repeated trough the list.元素n标识具有某个坐标 [x,y,z] 的节点,并且这些可以在列表中重复。 I wanted to average the [an,bn,...] lists for each node value n , such that I end up with a list nodes_averaged like [[[x1,y1,z1],[a1,b1,...]], [[xn,yn,zn],[an,bn,...]]]我想为每个节点值n平均 [an,bn,...] 列表,这样我最终得到一个列表nodes_averaged像 [[[x1,y1,z1],[a1,b1,...] ], [[xn,yn,zn],[an,bn,...]]]

For example, if I have the following list:例如,如果我有以下列表:

nodes= [[10, [6.5, 55.2, -10.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]], 
        [10, [6.5, 55.2, -10.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]],
        [3, [4.3, 55.4, -15.0], [-0.0016, -0.00058, -0.0256, -7.07e-06, 0.00051, 0.0088]],
        [1, [8.7, 54.9, -15.0], [-0.0016, -0.00058, -0.0256, 1.2e-05, -0.00044, 0.0088]],
        [10, [6.5, 55.2, -10.0], [-0.0011, -0.00041, -0.027, -1.12e-05, -0.00043, 0.0038]],
        [3, [4.3, 55.4, -15.0], [-0.00113, -0.000413, -0.027, 2.84e-06, 0.00039, 0.00389]]]

I wanted to obtain (the order is not important):我想获得(顺序不重要):

nodes_avg= [[[6.5, 55.2, -10.0], [-0.00036666666666666667, -0.00013666666666666666, -0.009, -3.7333333333333333e-06, -0.00014333333333333334, 0.0012666666666666666]],
            [[4.3, 55.4, -15.0], [-0.001365, -0.0004965, -0.0263, -2.115e-06, 0.00045, 0.006345]], 
            [[8.7, 54.9, -15.0], [-0.0016, -0.00058, -0.0256, 1.2e-05, -0.00044, 0.0088]]]

So far my approach was to extract a list of the node's range:到目前为止,我的方法是提取节点范围的列表:

noderng = [n1, n2,..., nn]

and using node_range=list(dict.fromkeys(node_range)) to eliminate repeated nodes, then iterating over both lists to match the elements n from the nodes list to the elements in the node_range list, like this:并使用 node_range=list(dict.fromkeys(node_range)) 消除重复节点,然后遍历两个列表以将节点列表中的元素nnode_range列表中的元素匹配,如下所示:

noderng=[]
for l in nodes:
    noderng.append(l[0])
noderng=list(dict.fromkeys(noderng))

nodeavg={}
elposition={}

for n in noderng:
    nodeavg[n]=[[],[],[],[],[],[]]
    for l in nodes:
        if int(l[0])==int(n):
            for i in range(6):
                nodeavg[n][i].append(l[2][i])
            elposition[n]=l[1]

avg={}    
for key, strains in nodeavg.items():
    avg[key]=[]
    for ess in strains:
        avg[key].append(sum(ess)/len(ess))

nodes_avg=[]
for key ,value in avg.items():
    nodes_avg.append([elposition[key],value])

I get the desired result, the problem is that the nodes list can have hundreds of thousands of elements, and it'll take hours to do this operation.我得到了想要的结果,问题是节点列表可能有数十万个元素,并且执行此操作需要几个小时。 I have switched to using numpy arrays, and it does help slightly, but it only shaves off a couple of minutes, which is not a great help.我已经切换到使用 numpy arrays,它确实有一点帮助,但它只减少了几分钟,这不是一个很大的帮助。

Is there a more efficient way to do this operation?有没有更有效的方法来执行此操作?

This is not an answer just a few suggestions:这不是一个答案,只是一些建议:

  1. You can use timer and cProfile to understand which part of your code takes more time and focus on that part.您可以使用timercProfile来了解代码的哪一部分需要更多时间并专注于该部分。

  2. another suggestion that has worked for me in the past is to create a dictionary from a set that includes your keys.过去对我有用的另一个建议是从包含您的键的集合中创建一个字典。 In this way, you can reduce the hash time.这样可以减少hash时间。 This works if you know all the n values beforehand.如果您事先知道所有 n 值,则此方法有效。

  3. Check this to have a better understanding of the time complexity of different structs in Python.选中此项可以更好地了解 Python 中不同结构的时间复杂度。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM