[英]Most frequent values in a dictionary
I have the following dictionary: 我有以下字典:
d = {"a":["MRS","VAL"],"b":"PRS","c":"MRS","d":"NTS"}
I would like to create a dictionary which gives the occurence of each values. 我想创建一个字典,给出每个值的出现。 Basically, it would look like:
基本上,它看起来像:
output = {"MRS":2,"PRS":1,"NTS":1,"VAL":1}
Does anyone know how I could do that ? 有谁知道我该怎么做? Thanks in advance !
提前致谢 !
Since your dict is composed of both strings and lists of strings, you first need to flatten those elements to a common type of string: 由于您的字典由字符串和字符串列表组成,因此您首先需要将这些元素展平为常见的字符串类型:
import collections
d = {"a":["MRS","VAL"],"b":"PRS","c":"MRS","d":"NTS"}
def flatten(l):
for el in l:
if isinstance(el, collections.Iterable) and not isinstance(el, basestring):
for sub in flatten(el):
yield sub
else:
yield el
>>> list(flatten(d.values()))
['MRS', 'VAL', 'MRS', 'PRS', 'NTS']
Then you can then use a Counter to count the occurrences of each string: 然后,您可以使用计数器来计数每个字符串的出现次数:
>>> collections.Counter(flatten(d.values()))
Counter({'MRS': 2, 'NTS': 1, 'PRS': 1, 'VAL': 1})
As already posted you can possibly use collections.Counter
as it is an obvious approach or else you can either use itertools.groupby
or a combination of itertools.groupby
and collections.Counter
正如已经发布的那样,您可能可以使用
collections.Counter
因为这是一种显而易见的方法,或者您可以使用itertools.groupby
或itertools.groupby
和collections.Counter
的组合
Just itertools.groupby
只是
itertools.groupby
>>> from itertools import groupby >>> a, b = [list(g) for _, g in groupby(d.values(), type)] >>> {k: len(list(g)) for k, g in groupby(sorted(a[0] + b))} {'NTS': 1, 'VAL': 1, 'PRS': 1, 'MRS': 2}
itertools.groupby
and collections.Counter
itertools.groupby
和collections.Counter
>>> from itertools import groupby >>> a, b = [list(g) for _, g in groupby(d.values(), type)] >>> dict(Counter(a[0] + b)) {'NTS': 1, 'VAL': 1, 'PRS': 1, 'MRS': 2}
This Just does the Job for the problem OP has though not robust. 这只是针对OP的问题所做的工作,尽管它并不可靠。
In general, you can use a Counter
to map keys to counts - it's essentially a multiset . 通常,您可以使用
Counter
将键映射到计数-本质上是一个multiset 。
Since your dict is multi-dimensional you'll have to do a little transforming, but if you simply iterate over every value and sub-value in your dict and add it to a Counter
instance, you'll get what you want. 由于您的dict是多维的,因此您必须进行一些转换,但是如果您简单地遍历dict中的每个值和子值并将其添加到
Counter
实例中,您将获得所需的内容。
Here's a first-pass implementation; 这是一个首过实施; depending on exactly what
d
will contain you may have to tweak it a bit: 根据
d
将包含的内容,您可能需要对其进行一些调整:
counts = Counter()
for elem in d.values():
if isinstance(obj, Iterable) and not isinstance(elem, types.StringTypes):
for sub_elem in elem:
counter.add(sub_elem)
else:
counter.add(elem)
Notice that we check if elem
is an iterable and not a string . 注意,我们检查
elem
是否是可迭代的,而不是字符串 。 Python doesn't make distinguishing between strings and collections easy, so if you know d
will contain only strings and lists (for instance) you can simply do isinstance(elem, list)
and so on. Python并不容易区分字符串和集合,因此,如果您知道
d
仅包含字符串和列表(例如),则可以简单地执行isinstance(elem, list)
等。 If you can't guarantee the values of d
will all be lists (or tuples, or so on) it's better to explicitly exclude strings. 如果不能保证
d
的值都是列表(或元组等),最好显式排除字符串。
Also, if d
could contain recursive keys (eg a list containing lists containing strings) this won't be sufficient; 同样,如果
d
可以包含递归键(例如,一个包含包含字符串的列表的列表),那将是不够的; you'll likely want to write a recursive function to flatten everything, like dawg's solution. 您可能会想编写一个递归函数来平整所有内容,例如dawg的解决方案。
I am lazy, so I am going to use library functions to get the job done for me: 我很懒,所以我将使用库函数为我完成工作:
import itertools
import collections
d = {"a": ["MRS", "VAL"], "b": "PRS", "c": "MRS", "d": "NTS"}
values = [[x] if isinstance(x, basestring) else x for x in d.values()]
counter = collections.Counter(itertools.chain.from_iterable(values))
print counter
print counter['MRS'] # Sampling
Output: 输出:
Counter({'MRS': 2, 'NTS': 1, 'PRS': 1, 'VAL': 1})
2
At the end, counter acts like the dictionary you want. 最后,计数器的作用类似于您想要的字典。
Consider this line: 考虑这一行:
values = [[x] if isinstance(x, basestring) else x for x in d.values()]
Here, I turned every value in the dictionary d
into a list to make processing easier. 在这里,我将字典
d
中的每个值转换为一个列表,以简化处理。 values
might look something like the following (order might be different, which is fine): values
可能类似于以下内容(顺序可能不同,这很好):
# values = [['MRS', 'VAL'], ['MRS'], ['PRS'], ['NTS']]
Next, the expression: 接下来,表达式:
itertools.chain.from_iterable(values)
returns a generator which flatten the list, conceptually, the list now looks like this: 返回一个使列表变平的生成器,从概念上讲,列表现在看起来像这样:
['MRS', 'VAL', 'MRS', 'PRS', 'NTS']
Finally, the Counter class takes that list and count, so we ended up with the final result. 最后,Counter类接受该列表并计数,因此我们得到了最终结果。
You can do it, with just built-in function, this way: 您可以通过内置函数来做到这一点:
>>> d = {"a":["MRS","VAL"],"b":"PRS","c":"MRS","d":"NTS"}
>>>
>>> flat = []
>>> for elem in d.values():
if isinstance(elem, list):
for sub_elem in elem:
flat.append(sub_elem)
else:
flat.append(elem)
>>> flat
['MRS', 'VAL', 'MRS', 'PRS', 'NTS']
>>>
>>> output = {}
>>>
>>> for item in flat:
output[item] = flat.count(item)
>>>
>>> output
{'NTS': 1, 'PRS': 1, 'VAL': 1, 'MRS': 2}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.