简体   繁体   English

Python根据匹配的键/值对减少字典列表

[英]Python Reduce List of Dicts based on matching key/value pairs

I have a list of dicts which specify flows (source to hop to destionation with their respective volume). 我有一个字典列表,这些字典指定了流量(要跳转到其各自量的目标的源)。 Now i want to split these flows into link (eg (source to hop with volume, hop to destination with volume) and merge all duplicate links together by summing up their volumes. 现在,我想将这些流拆分为链接(例如(源到卷的跳数,跳到目的地的卷)),并通过汇总其所有卷将所有重复的链接合并在一起。

Since I'm new to python I'm wondering what a good approach would be. 由于我是python的新手,所以我想知道哪种方法更好。 My first approach would be to loop through all flows and nest a loop through all links inside and check if the links already exists. 我的第一种方法是遍历所有流,并在内部的所有链接之间嵌套一个环,并检查链接是否已存在。

But if I have millions of flows, that might become quite ineffienct and slow I guess. 但是,如果我有数百万个流量,我想那可能会变得非常无效且缓慢。

My starting data looks like this: 我的起始数据如下:

flows = [
    {
        'source': 1,
        'hop': 2,
        'destination': 3,
        'volume': 100,
    },{
        'source': 1,
        'hop': 2,
        'destination': 4,
        'volume': 50,
    },{
        'source': 2,
        'hop': 2,
        'destination': 4,
        'volume': 200,
    },
]

What my result should be: 我的结果应该是:

links = [
    {
        'source': 1,
        'hop': 2,
        'volume': 150,
    },{
        'hop': 2,
        'destination': 3,
        'volume': 100,
    },{
        'hop': 2,
        'destination': 4,
        'volume': 250,
    },{
        'source': 2,
        'hop': 2,
        'volume': 200,
    },
]

Thanks a lot for your help! 非常感谢你的帮助!

You can collect the links to two different dictionaries, one between source & hop and another one between hop & destination. 您可以收集到两个不同字典的链接,一个在源与跃点之间,另一个在跃点与目的地之间。 Then you can easily create the result list separately from both of the dicts. 然后,您可以轻松地将结果与两个字典分开创建。 Below Counter is used which is dict like object with 0 as default value: Counter下面使用了像对象一样的dict ,默认值为0:

import pprint
from collections import Counter

flows = [
    {
        'source': 1,
        'hop': 2,
        'destination': 3,
        'volume': 100.5,
    },{
        'source': 1,
        'hop': 2,
        'destination': 4,
        'volume': 50,
    },{
        'source': 2,
        'hop': 2,
        'destination': 4,
        'volume': 200.7,
    },
]

sources = Counter()
hops = Counter()

for f in flows:
    sources[f['source'], f['hop']] += f['volume']
    hops[f['hop'], f['destination']] += f['volume']

res = [{'source': source, 'hop': hop, 'volume': vol} for (source, hop), vol in sources.items()]
res.extend([{'hop': hop, 'destination': dest, 'volume': vol} for (hop, dest), vol in hops.items()])
pprint.pprint(res)

Output: 输出:

[{'hop': 2, 'source': 1, 'volume': 150.5},
 {'hop': 2, 'source': 2, 'volume': 200.7},
 {'destination': 3, 'hop': 2, 'volume': 100.5},
 {'destination': 4, 'hop': 2, 'volume': 250.7}]

Above will run in O(n) time so it should work with millions of flows provided you have enough memory. 上面的代码将以O(n)的时间运行,因此只要您有足够的内存,它就可以处理数百万个流。

pseudo algorithm: 伪算法:

  1. create an empty result list/set/dictionary 创建一个空的结果列表/集合/字典
  2. loop over de flows list 循环流列表
  3. split up each single flow into 2 links 将每个单独的流程分成2个链接
  4. for each of these 2 links test if they are already in the result list (based on the 2 nodes). 对于这2个链接中的每一个,测试它们是否已在结果列表中(基于2个节点)。
  5. if not: add them. 如果没有:添加它们。 if yes: upgrade the volume of the one already in the list. 如果是,请升级列表中已存在的卷的音量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM