简体   繁体   English

字典中最常用的值

[英]Most frequent values in a dictionary

I have the following dictionary: 我有以下字典:

d = {"a":["MRS","VAL"],"b":"PRS","c":"MRS","d":"NTS"}

I would like to create a dictionary which gives the occurence of each values. 我想创建一个字典,给出每个值的出现。 Basically, it would look like: 基本上,它看起来像:

output = {"MRS":2,"PRS":1,"NTS":1,"VAL":1}

Does anyone know how I could do that ? 有谁知道我该怎么做? Thanks in advance ! 提前致谢 !

Since your dict is composed of both strings and lists of strings, you first need to flatten those elements to a common type of string: 由于您的字典由字符串和字符串列表组成,因此您首先需要将这些元素展平为常见的字符串类型:

import collections
d = {"a":["MRS","VAL"],"b":"PRS","c":"MRS","d":"NTS"}

def flatten(l):
    for el in l:
        if isinstance(el, collections.Iterable) and not isinstance(el, basestring):
            for sub in flatten(el):
                yield sub
        else:
            yield el

>>> list(flatten(d.values()))
['MRS', 'VAL', 'MRS', 'PRS', 'NTS']

Then you can then use a Counter to count the occurrences of each string: 然后,您可以使用计数器来计数每个字符串的出现次数:

>>> collections.Counter(flatten(d.values())) 
Counter({'MRS': 2, 'NTS': 1, 'PRS': 1, 'VAL': 1})

As already posted you can possibly use collections.Counter as it is an obvious approach or else you can either use itertools.groupby or a combination of itertools.groupby and collections.Counter 正如已经发布的那样,您可能可以使用collections.Counter因为这是一种显而易见的方法,或者您可以使用itertools.groupbyitertools.groupbycollections.Counter的组合

  1. Just itertools.groupby 只是itertools.groupby

     >>> from itertools import groupby >>> a, b = [list(g) for _, g in groupby(d.values(), type)] >>> {k: len(list(g)) for k, g in groupby(sorted(a[0] + b))} {'NTS': 1, 'VAL': 1, 'PRS': 1, 'MRS': 2} 
  2. itertools.groupby and collections.Counter itertools.groupbycollections.Counter

     >>> from itertools import groupby >>> a, b = [list(g) for _, g in groupby(d.values(), type)] >>> dict(Counter(a[0] + b)) {'NTS': 1, 'VAL': 1, 'PRS': 1, 'MRS': 2} 

This Just does the Job for the problem OP has though not robust. 这只是针对OP的问题所做的工作,尽管它并不可靠。

In general, you can use a Counter to map keys to counts - it's essentially a multiset . 通常,您可以使用Counter将键映射到计数-本质上是一个multiset

Since your dict is multi-dimensional you'll have to do a little transforming, but if you simply iterate over every value and sub-value in your dict and add it to a Counter instance, you'll get what you want. 由于您的dict是多维的,因此您必须进行一些转换,但是如果您简单地遍历dict中的每个值和子值并将其添加到Counter实例中,您将获得所需的内容。

Here's a first-pass implementation; 这是一个首过实施; depending on exactly what d will contain you may have to tweak it a bit: 根据d将包含的内容,您可能需要对其进行一些调整:

counts = Counter()
for elem in d.values():
  if isinstance(obj, Iterable) and not isinstance(elem, types.StringTypes):
    for sub_elem in elem:
      counter.add(sub_elem)
  else:
    counter.add(elem)

Notice that we check if elem is an iterable and not a string . 注意,我们检查elem 是否是可迭代的,而不是字符串 Python doesn't make distinguishing between strings and collections easy, so if you know d will contain only strings and lists (for instance) you can simply do isinstance(elem, list) and so on. Python并不容易区分字符串和集合,因此,如果您知道d仅包含字符串和列表(例如),则可以简单地执行isinstance(elem, list)等。 If you can't guarantee the values of d will all be lists (or tuples, or so on) it's better to explicitly exclude strings. 如果不能保证d的值都是列表(或元组等),最好显式排除字符串。

Also, if d could contain recursive keys (eg a list containing lists containing strings) this won't be sufficient; 同样,如果d可以包含递归键(例如,一个包含包含字符串的列表的列表),那将是不够的; you'll likely want to write a recursive function to flatten everything, like dawg's solution. 您可能会想编写一个递归函数来平整所有内容,例如dawg的解决方案。

I am lazy, so I am going to use library functions to get the job done for me: 我很懒,所以我将使用库函数为我完成工作:

import itertools
import collections

d = {"a": ["MRS", "VAL"], "b": "PRS", "c": "MRS", "d": "NTS"}
values = [[x] if isinstance(x, basestring) else x for x in d.values()]
counter = collections.Counter(itertools.chain.from_iterable(values))
print counter
print counter['MRS']  # Sampling

Output: 输出:

Counter({'MRS': 2, 'NTS': 1, 'PRS': 1, 'VAL': 1})
2

At the end, counter acts like the dictionary you want. 最后,计数器的作用类似于您想要的字典。

Explanation 说明

Consider this line: 考虑这一行:

values = [[x] if isinstance(x, basestring) else x for x in d.values()]

Here, I turned every value in the dictionary d into a list to make processing easier. 在这里,我将字典d中的每个值转换为一个列表,以简化处理。 values might look something like the following (order might be different, which is fine): values可能类似于以下内容(顺序可能不同,这很好):

# values = [['MRS', 'VAL'], ['MRS'], ['PRS'], ['NTS']]

Next, the expression: 接下来,表达式:

itertools.chain.from_iterable(values)

returns a generator which flatten the list, conceptually, the list now looks like this: 返回一个使列表变平的生成器,从概念上讲,列表现在看起来像这样:

['MRS', 'VAL', 'MRS', 'PRS', 'NTS']

Finally, the Counter class takes that list and count, so we ended up with the final result. 最后,Counter类接受该列表并计数,因此我们得到了最终结果。

You can do it, with just built-in function, this way: 您可以通过内置函数来做到这一点:

>>> d = {"a":["MRS","VAL"],"b":"PRS","c":"MRS","d":"NTS"}
>>> 
>>> flat = []
>>> for elem in d.values():
    if isinstance(elem, list):
        for sub_elem in elem:
            flat.append(sub_elem)
    else:
        flat.append(elem)


>>> flat
['MRS', 'VAL', 'MRS', 'PRS', 'NTS']
>>> 
>>> output = {}
>>> 
>>> for item in flat:
    output[item] = flat.count(item)
>>>
>>> output
{'NTS': 1, 'PRS': 1, 'VAL': 1, 'MRS': 2}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM