[英]Identify same values for particular key in list of dictionaries
I have a list of dictionaries that look like this:我有一个看起来像这样的字典列表:
[
{'ServiceID': 20, 'primary': '20', 'secondary': '12'},
{'ServiceID': 20, 'primary': '20', 'secondary': '12'},
{'ServiceID': 20, 'primary': '20', 'secondary': '12'},
{'ServiceID': 16, 'primary': '16', 'secondary': '8'},
{'ServiceID': 20, 'primary': '20', 'secondary': '12'},
{'ServiceID': 8, 'primary': '8', 'secondary': '16'},
{'ServiceID': 12, 'primary': '12', 'secondary': '20'},
{'ServiceID': 8, 'primary': '8', 'secondary': '16'}
]
I would like create a new sorted dictionary where the we have the following:我想创建一个新的排序字典,其中我们有以下内容:
key = value of 'ServiceID'
key = value of how many times that particular 'ServiceID' is listed as a 'primary'
key = value of how many times that particular 'ServiceID' is listed as a 'secondary'
For example:例如:
[
{'ServiceID': 8, 'primaryCount': 2, 'secondaryCount': 1},
{'ServiceID': 12, 'primaryCount': 1, 'secondaryCount': 4},
{'ServiceID': 16, 'primaryCount': 1, 'secondaryCount': 2},
{'ServiceID': 120, 'primaryCount': 4, 'secondaryCount': 1}
]
Code that I have so far that doesn't quite seem to do what I desire, meaning that I am unsure as to how to appropriately increment the number of primaries and secondaries across the entire for loop as well as how to only ensure I am capturing the uniques for the 'ServiceID'到目前为止,我所拥有的代码似乎并没有达到我想要的效果,这意味着我不确定如何在整个 for 循环中适当地增加初级和次级的数量,以及如何确保我正在捕获'ServiceID' 的唯一性
I believe there is something wrong with my logic:我认为我的逻辑有问题:
temp_count_list = list()
temp_primary_counts = 0
temp_secondary_counts = 0
for sub_dict in new_list:
temp_dict = dict()
temp_dict['ServiceID'] = sub_dict['ServiceID']
if sub_dict['ServiceID'] == int(sub_dict['primarySlice']):
temp_dict['primaryCount'] = temp_primary_counts +=1
if sub_dict['ServiceID'] == int(sub_dict['secondarySlice']):
temp_dict['secondaryCount'] = temp_secondary_counts +=1
temp_count_list.append(temp_dict)
Basic idea is, get all the ServiceID, primary, secondary in a dict (in code k), and then for each unique ServiceID count the frequency of that ServiceID in the primary and secondary.基本思想是,在一个dict(在代码k中)中获取所有的ServiceID、primary、secondary,然后为每个唯一的ServiceID计算该ServiceID在primary和secondary中的频率。
l = [
{'ServiceID': 20, 'primary': '20', 'secondary': '12'},
{'ServiceID': 20, 'primary': '20', 'secondary': '12'},
{'ServiceID': 20, 'primary': '20', 'secondary': '12'},
{'ServiceID': 16, 'primary': '16', 'secondary': '8'},
{'ServiceID': 20, 'primary': '20', 'secondary': '12'},
{'ServiceID': 8, 'primary': '8', 'secondary': '16'},
{'ServiceID': 12, 'primary': '12', 'secondary': '20'},
{'ServiceID': 8, 'primary': '8', 'secondary': '16'}
]
k = {'ServiceID': [], 'primaryCount': [], 'secondaryCount': []}
for i in l:
k['ServiceID'].append(i['ServiceID'])
k['primaryCount'].append(i['primary'])
k['secondaryCount'].append(i['secondary'])
res = {'ServiceID': 0, 'primaryCount': [], 'secondaryCount': []}
result = []
for i in sorted(set(k['ServiceID'])):
res['ServiceID']=i
res['primaryCount'] = k['primaryCount' ].count(str(i))
res['secondaryCount'] = k['secondaryCount' ].count(str(i))
result.append(res)
res = {'ServiceID': 0, 'primaryCount': [], 'secondaryCount': []}
print(result)
output output
[
{'ServiceID': 8, 'primaryCount': 2, 'secondaryCount': 1},
{'ServiceID': 12, 'primaryCount': 1, 'secondaryCount': 4},
{'ServiceID': 16, 'primaryCount': 1, 'secondaryCount': 2},
{'ServiceID': 20, 'primaryCount': 4, 'secondaryCount': 1}
]
You can do the following (l is your list):您可以执行以下操作(l 是您的列表):
d={i['ServiceID']:{'primaryCount':0, 'secondaryCount':0} for i in l}
for i in l:
d[int(i['primary'])]['primaryCount']+=1
d[int(i['secondary'])]['secondaryCount']+=1
res=[{'ServiceID':i, 'primaryCount': k['primaryCount'], 'secondaryCount': k['secondaryCount']} for i, k in d.items()]
Output: Output:
>>> print(res)
[{'ServiceID': 20, 'primaryCount': 4, 'secondaryCount': 1}, {'ServiceID': 16, 'primaryCount': 1, 'secondaryCount': 2}, {'ServiceID': 8, 'primaryCount': 2, 'secondaryCount': 1}, {'ServiceID': 12, 'primaryCount': 1, 'secondaryCount': 4}]
It seems like the correct solution here would involve using collections.Counter
s (or largely equivalently in this case, collections.defaultdict(int)
s) to allow you to cheaply and easily increment counts without relying on them being adjacent in the input, and without using intermediate data structures that add pointless overhead;似乎这里正确的解决方案将涉及使用
collections.Counter
s (或在这种情况下基本上等同于collections.defaultdict(int)
s),以允许您廉价且轻松地增加计数,而无需依赖它们在输入中相邻,并且不使用增加无意义开销的中间数据结构; why build the result all at once when you can count the parts you care about with simpler code, then build the result with equally simple code from those simple counts?当你可以用更简单的代码计算你关心的部分时,为什么要一次构建结果,然后用同样简单的代码从这些简单的计数中构建结果? You don't actually use the
'ServiceID'
field in the input, so you may as well just count efficiently, and convert back to the preferred format at the end:您实际上并没有在输入中使用
'ServiceID'
字段,因此您不妨有效地计数,并在最后转换回首选格式:
import pprint # For pretty-printing in the example
from collections import Counter
inp = [
{'ServiceID': 20, 'primary': '20', 'secondary': '12'},
{'ServiceID': 20, 'primary': '20', 'secondary': '12'},
{'ServiceID': 20, 'primary': '20', 'secondary': '12'},
{'ServiceID': 16, 'primary': '16', 'secondary': '8'},
{'ServiceID': 20, 'primary': '20', 'secondary': '12'},
{'ServiceID': 8, 'primary': '8', 'secondary': '16'},
{'ServiceID': 12, 'primary': '12', 'secondary': '20'},
{'ServiceID': 8, 'primary': '8', 'secondary': '16'}
]
primarycount = Counter()
secondarycount = Counter()
for d in inp:
primarycount[int(d['primary'])] += 1 # Counts times seen as primary
secondarycount[int(d['secondary'])] += 1 # Counts times seen as secondary
# Just to see intermediate results
print(primarycount)
print(secondarycount)
# Make new list mapping each thing seen to its counts
# The union of keys ensures anything with even one count in input appears in the output
# Sorting the union before iterating gets desired output order
result = [{'ServiceID': sid, 'primaryCount': primarycount[sid], 'secondaryCount': secondarycount[sid]}
for sid in sorted(primarycount.keys() | secondarycount.keys())]
pprint.pprint(result)
which produces output:产生 output:
Counter({20: 4, 8: 2, 16: 1, 12: 1})
Counter({12: 4, 16: 2, 8: 1, 20: 1})
[{'ServiceID': 8, 'primaryCount': 2, 'secondaryCount': 1},
{'ServiceID': 12, 'primaryCount': 1, 'secondaryCount': 4},
{'ServiceID': 16, 'primaryCount': 1, 'secondaryCount': 2},
{'ServiceID': 20, 'primaryCount': 4, 'secondaryCount': 1}]
This might be slightly wrong if some ServiceID
s might be seen in the input, but never as a primary
or secondary
(they won't appear in the output at all, rather than appearing with zero counts; unclear which is correct), or if primary
or secondary
values sometimes appear where the ServiceID
corresponding never appears in the input (they'll show up in the output with counts, rather than being omitted; again, unclear on which is correct), but it's relatively trivial to fix.如果在输入中可能会看到一些
ServiceID
,但从不作为primary
或secondary
的(它们根本不会出现在 output 中,而不是以零计数出现;不清楚哪个是正确的),这可能会有点错误,或者如果primary
或secondary
值有时会出现在对应的ServiceID
从未出现在输入中的位置(它们将显示在 output 中并带有计数,而不是被省略;同样,不清楚哪个是正确的),但修复起来相对简单。 Flipping both behaviors would just involve changing primarycount.keys() | secondarycount.keys()
翻转这两种行为只需要更改
primarycount.keys() | secondarycount.keys()
primarycount.keys() | secondarycount.keys()
to {d['ServiceID'] for d in inp}
to ensure values come from the input ServiceID
fields, not a combination of all values seen for primary
and secondary
. primarycount.keys() | secondarycount.keys()
到{d['ServiceID'] for d in inp}
以确保值来自输入ServiceID
字段,而不是看到的所有值的组合primary
和secondary
。 For the provided input, both approaches are equivalent (with the former being slightly faster in most cases, where there are many duplicate ServiceID
s in the input).对于提供的输入,两种方法是等效的(在大多数情况下,前者稍快一些,因为输入中有许多重复的
ServiceID
)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.