[英]Python Reduce List of Dicts based on matching key/value pairs
I have a list of dicts which specify flows (source to hop to destionation with their respective volume). 我有一个字典列表,这些字典指定了流量(要跳转到其各自量的目标的源)。 Now i want to split these flows into link (eg (source to hop with volume, hop to destination with volume) and merge all duplicate links together by summing up their volumes.
现在,我想将这些流拆分为链接(例如(源到卷的跳数,跳到目的地的卷)),并通过汇总其所有卷将所有重复的链接合并在一起。
Since I'm new to python I'm wondering what a good approach would be. 由于我是python的新手,所以我想知道哪种方法更好。 My first approach would be to loop through all flows and nest a loop through all links inside and check if the links already exists.
我的第一种方法是遍历所有流,并在内部的所有链接之间嵌套一个环,并检查链接是否已存在。
But if I have millions of flows, that might become quite ineffienct and slow I guess. 但是,如果我有数百万个流量,我想那可能会变得非常无效且缓慢。
My starting data looks like this: 我的起始数据如下:
flows = [
{
'source': 1,
'hop': 2,
'destination': 3,
'volume': 100,
},{
'source': 1,
'hop': 2,
'destination': 4,
'volume': 50,
},{
'source': 2,
'hop': 2,
'destination': 4,
'volume': 200,
},
]
What my result should be: 我的结果应该是:
links = [
{
'source': 1,
'hop': 2,
'volume': 150,
},{
'hop': 2,
'destination': 3,
'volume': 100,
},{
'hop': 2,
'destination': 4,
'volume': 250,
},{
'source': 2,
'hop': 2,
'volume': 200,
},
]
Thanks a lot for your help! 非常感谢你的帮助!
You can collect the links to two different dictionaries, one between source & hop and another one between hop & destination. 您可以收集到两个不同字典的链接,一个在源与跃点之间,另一个在跃点与目的地之间。 Then you can easily create the result list separately from both of the dicts.
然后,您可以轻松地将结果与两个字典分开创建。 Below
Counter
is used which is dict
like object with 0 as default value: 在
Counter
下面使用了像对象一样的dict
,默认值为0:
import pprint
from collections import Counter
flows = [
{
'source': 1,
'hop': 2,
'destination': 3,
'volume': 100.5,
},{
'source': 1,
'hop': 2,
'destination': 4,
'volume': 50,
},{
'source': 2,
'hop': 2,
'destination': 4,
'volume': 200.7,
},
]
sources = Counter()
hops = Counter()
for f in flows:
sources[f['source'], f['hop']] += f['volume']
hops[f['hop'], f['destination']] += f['volume']
res = [{'source': source, 'hop': hop, 'volume': vol} for (source, hop), vol in sources.items()]
res.extend([{'hop': hop, 'destination': dest, 'volume': vol} for (hop, dest), vol in hops.items()])
pprint.pprint(res)
Output: 输出:
[{'hop': 2, 'source': 1, 'volume': 150.5},
{'hop': 2, 'source': 2, 'volume': 200.7},
{'destination': 3, 'hop': 2, 'volume': 100.5},
{'destination': 4, 'hop': 2, 'volume': 250.7}]
Above will run in O(n) time so it should work with millions of flows provided you have enough memory. 上面的代码将以O(n)的时间运行,因此只要您有足够的内存,它就可以处理数百万个流。
pseudo algorithm: 伪算法:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.