简体   繁体   English

Python-对字典中存储的列表中的项目进行计数和分组

[英]Python - count and group items in list stored in dictionary

I have seen examples on how to count items in dictionary or list. 我已经看到了有关如何计算字典或列表中项目的示例。 My dictionary stored multiple lists. 我的词典存储了多个列表。 Each list stores multiple items. 每个列表存储多个项目。

d = dict{}
d  = {'text1': ['A', 'C', 'E', 'F'], 
      'text2': ['A'], 
      'text3': ['C', 'D'], 
      'text4': ['A', 'B'], 
      'text5': ['A']}

1. I want to count frequency of each alphabet, ie the results should be 1.我想计算每个字母的频率,即结果应为

A - 4  
B - 1  
C - 2  
D - 1  
E - 1  
F - 1

2. I want to have group by each alphabet, ie the results should be 2.我想按每个字母分组,即结果应为

A - text1, text2, text4, text5  
B - text4  
C - text1, text3  
D - text3  
E - text1  
F - text1  

How can I achieve both by using some Python existing libraries without using many for loops? 如何通过使用一些现有的Python库而不使用许多for循环来实现这两者?

To get to (2), you would have to first invert the keys and values of a dictionary, and store them in a list. 要进入(2),您必须首先反转字典的键和值,并将它们存储在列表中。 Once you are there, use groupby with a key to get to the structure of (2). 到达那里后,使用groupby和一个键来访问(2)的结构。

from itertools import groupby

arr = [(x,t) for t, a in d.items() for x in a]
# [('A', 'text2'), ('C', 'text3'), ('D', 'text3'), ('A', 'text1'), ('C', 'text1'), ('E', 'text1'), ('F', 'text1'), ('A', 'text4'), ('B', 'text4'), ('A', 'text5')]

res = {g: [x[1] for x in items] for g, items in groupby(sorted(arr), key=lambda x: x[0])}
#{'A': ['text1', 'text2', 'text4', 'text5'], 'C': ['text1', 'text3'], 'B': ['text4'], 'E': ['text1'], 'D': ['text3'], 'F': ['text1']}

res2 = {x: len(y) for x, y in res.items()}
#{'A': 4, 'C': 2, 'B': 1, 'E': 1, 'D': 1, 'F': 1}

PS: I am hoping you'd meaningful variable names in your real code. PS:我希望您在真实代码中使用有意义的变量名。

There are a few ways to accomplish this, but if you'd like to handle things without worrying about import ing additional modules or installing and importing external modules, this method will work cleanly 'out of the box.' 有几种方法可以完成此操作,但是如果您希望处理这些事情而不必担心import其他模块或安装和导入外部模块,则此方法将“开箱即用”。

With d as your starting dictionary: d作为起始字典:

d  = {'text1': ['A', 'C', 'E', 'F'], 
      'text2': ['A'], 
      'text3': ['C', 'D'], 
      'text4': ['A', 'B'], 
      'text5': ['A']}

create a new dict , called letters , for your results to live in, and populate it with your letters, taken from d.keys() , by creating the letter key if it isn't present, and creating a list with the count and the key from d as it's value. 创建一个新的dict ,呼吁letters ,你的结果住,并与你的信,取自填充它d.keys()创建如果它不存在的字母键,并创建与计数的列表和来自d的键值。 If it's already there, increment the count, and append the current key from d to it's d key list in the value. 如果已经存在,则增加计数,并将当前键从d附加到值的d键列表中。

letters = {}
for item in d.keys():
    for letter in d[item]:
        if letter not in letters.keys():
            letters[letter] = [1,[item]]            
        else:
            letters[letter][0] += 1
            letters[letter][1] += [item]

This leaves you with a dict called letters containing values of the counts and the keys from d that contain the letter, like this: 这样,您便得到了一个包含lettersdict letters其中包含计数值以及d中包含字母的键,如下所示:

{'E': [1, ['text1']], 'C': [2, ['text3', 'text1']], 'F': [1, ['text1']], 'A': [4, ['text2', 'text4', 'text1', 'text5']], 'B': [1, ['text4']], 'D': [1, ['text3']]}`

Now, to print your first list, do: 现在,要打印您的第一个列表,请执行以下操作:

for letter in sorted(letters):
    print(letter, letters[letter][0])

printing each letter and the contents of the first, or 'count' index of the list as its value, and using the built-in sorted() function to put things in order. 打印每个字母和列表的第一个索引(或“计数”索引)的内容作为其值,并使用内置的sorted()函数对事物进行排序。

To print the second, likewise sorted() , do the same, but with the second, or 'key', index of the list in its value, .joined using a , into a string: 要打印第2,同样sorted()做同样的,但与第二或“钥匙”,它的价值列表中的指标, .joined使用,为一个字符串:

for letter in sorted(letters):
    print(letter, ', '.join(letters[letter][1]))

To ease Copy/Paste, here's the code unbroken by my ramblings: 为了简化“复制/粘贴”操作,以下是我杂乱无章的代码:

d  = {'text1': ['A', 'C', 'E', 'F'], 
      'text2': ['A'], 
      'text3': ['C', 'D'], 
      'text4': ['A', 'B'], 
      'text5': ['A']}

letters = {}
for item in d.keys():
    for letter in d[item]:
        if letter not in letters.keys():
            letters[letter] = [1,[item]]            
        else:
            letters[letter][0] += 1
            letters[letter][1] += [item]

print(letters)

for letter in letters:
    print(letter, letters[letter][0])
print()
for letter in letters:
    print(letter, ', '.join(letters[letter][1]))

Hope this helps! 希望这可以帮助!

from collections import Counter, defaultdict
from itertools import chain
d  = {'text1': ['A', 'C', 'E', 'F'], 
      'text2': ['A'], 
      'text3': ['C', 'D'], 
      'text4': ['A', 'B'], 
      'text5': ['A']}
counter = Counter(chain.from_iterable(d.values()))
group = defaultdict(list)
for k, v in d.items():
    for i in v:
        group[i].append(k)

out: 出:

Counter({'A': 4, 'B': 1, 'C': 2, 'D': 1, 'E': 1, 'F': 1})
defaultdict(list,
            {'A': ['text2', 'text4', 'text1', 'text5'],
             'B': ['text4'],
             'C': ['text1', 'text3'],
             'D': ['text3'],
             'E': ['text1'],
             'F': ['text1']})
from collections import defaultdict

alphabets = defaultdict(list)
his is a way to acheive this:

    for text, letters in d.items():
        for letter in letters:
            alphabets[letter].append(text)

    for letter, texts in sorted(alphabets.items()):
        print(letter, texts)

    for letter, texts in sorted(alphabets.items()):
        print(letter, len(texts))

note that if you have A - text1, text2, text4, text5 to get to A - 4 is just a matter of counting the texts. 请注意,如果您拥有A - text1, text2, text4, text5才能到达A - 4则只需对文本进行计数即可。

For your first task: 对于您的第一个任务:

from collections import Counter


d = {
  'text1': ['A', 'C', 'E', 'F'],
  'text2': ['A'],
  'text3': ['C', 'D'],
  'text4': ['A', 'B'],
  'text5': ['A']
}

occurrences = Counter(''.join(''.join(values) for values in d.values()))
print(sorted(occurrences.items(), key=lambda l: l[0]))

Now let me explain it: 现在让我解释一下:

  • ''.join(values) turns the list (eg ['A', 'B', 'C', 'D'] into 'ABCD' ) ''.join(values)将列表(例如['A','B','C','D']转换为'ABCD'
  • Then you join each list from the dictionary into one string (the outer ''.join() ) 然后,将字典中的每个列表连接到一个字符串中(外部的''.join()
  • Counter is a class from the builtin package collections , which simply counts the elements in the iterable ( string in this case) and reproduces them as tuples of (key, value) pairs (eg ('A', 4) ) Counter是内置程序包集合中的一个类,它仅对可迭代的元素(在这种情况下为字符串)进行计数,并将它们复制为(键,值)对的元组(例如('A',4)
  • Finally, I sort the Counter items (it's just like a dictionary) alphabetically ( key=lambda l: l[0] where l[0] is the letter from the (key, value) pair. 最后,我按字母顺序对Counter项(就像字典一样)进行排序( key = lambda l:l [0]其中l [0](键,值)对中的字母。

As I saw, you already have the solution for your second problem. 如我所见,您已经有了第二个问题的解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM