简体   繁体   English

在Python中构建频率字典中的列表列表

[英]Building a list of lists from a frequency dictionary in Python

I need help finding a shortcut to build a frequency sorted list of lists from a frequency dictionary. 我需要帮助找到一个快捷方式,从频率字典中构建频率排序的列表列表。 I am able to build a list of lists (see below) by appending each element to a list and then appending each list to the 'list of lists' (easy with only frequencies 1-3), but what happens if I have frequencies up to 100 or more?! 我可以通过将每个元素附加到列表然后将每个列表附加到“列表列表”(仅使用1-3频率)来构建列表列表(参见下文),但如果我有频率上升会发生什么到100或更多?! There has to be a better way. 一定有更好的方法。

dictionary = {'ab':2, 'bc':3, 'cd':1, 'de':1, 'ef':3, 'fg':1, 'gh':2}
list_1 = []
list_2 = []
list_3 = []
list_of_lists = []

for key, value in dictionary.items():
    if value == 1:
            list_1.append(key)
for key, value in dictionary.items():
    if value == 2:
            list_2.append(key)
for key, value in dictionary.items():
    if value == 3:
            list_3.append(key)

list_of_lists.append(list_1)
list_of_lists.append(list_2)
list_of_lists.append(list_3)

print list_of_lists

copy of run in Python looks like this: Python中运行的副本如下所示:

[['de', 'cd', 'fg'], ['ab', 'gh'], ['ef', 'bc']] [['de','cd','fg'],['ab','gh'],['ef','bc']]

This is exactly what I want, but it won't work for a corpus of 100,000+ words with frequencies of 100+. 这正是我想要的,但它不适用于频率为100+的100,000多个单词的语料库。 Please help me find a better, less tedious way of building my list of lists. 请帮我找一个更好,更乏味的方法来构建我的列表列表。


solution 1 - Inverse-mapping via list-of-lists (what was asked for) 解决方案1 - 通过列表列表进行反向映射(要求的内容)

You are looking for something like a histogram, but the inverse. 你正在寻找类似直方图的东西,但反过来。

def inverseHistogram(valueFreqPairs):
    maxFreq = max(p[1] for p in valueFreqPairs)+1
    R = [[] for _ in range(maxFreq)]
    for value,freq in valueFreqPairs:
        R[freq] += [value]
    return R

Demo: 演示:

>>> inverseHistogram(dictionary.items())
[[], ['de', 'cd', 'fg'], ['ab', 'gh'], ['ef', 'bc']]

solution 2 - Inverse-mapping via defaultdict pattern (much cleaner) 解决方案2 - 通过defaultdict模式进行反向映射(更清晰)

Even better if you are content with using a dictionary to organize the inverse (which seems more elegant). 如果你满足于使用字典来组织逆向(这似乎更优雅),那就更好了。 This is how I'd personally do it. 这就是我亲自做的事情。

reverseDict = collections.defaultdict(list)
for value,freq in dictionary.items():
    reverseDict[freq].append(value)

Demo: 演示:

>>> dict(reverseDict)
{1: ['de', 'cd', 'fg'], 2: ['ab', 'gh'], 3: ['ef', 'bc']}

sidenote: This will also save you space if for example your frequencies are sparse, eg if your input was {'onlyitem':999999999} , then you avoid having to make a list larger than your memory, thereby locking your machine up. 旁注:如果您的频率很稀疏,这也可以节省空间,例如,如果输入是{'onlyitem':999999999} ,那么您可以避免使列表大于内存,从而锁定您的机器。

Best way: throw them all into a dict 最好的方法:将它们全部扔进dict

result = {}

for key, value in dictionary.iteritems():
  if not value in result:
    result[value] = []
  result[value].append(key)

Slightly simpler: 稍微简单一些:

from collections import defaultdict
result = defaultdict(list)

for key, value in dictionary.iteritems():
  result[value].append(key)

Or to create a list: 或者创建一个列表:

result = [[]] * max(dictionary.values())

for key, value in dictionary.iteritems():
  result[value-1].append(key)
dict_of_lists = {}

for key, value in dictionary.items():
    if value in dict_of_lists:
        dict_of_lists[value].append(key)
    else:
        dict_of_lists[value] = [key]

list_of_lists = dict_of_lists.values()

You could do simple something like that: 你可以做一些简单的事情:

dictionary = {'a1':2, ..., 'g':100}
MAX_FREQUENCE = max([dictionary[k] for k in dictionary]) //find the max frequency
list_of_lists=[[] for x in range(MAX_FREQUENCE] //generate empty list of lists
for k in dictionary:  
    dictionary[d[k]-1].append(k)

The -1 since the list_of_lists starts at 0. The construction of list on the fly : [f(x) for x in iterable] is called a list comprehension . 自list_of_lists从0开始的-1 。动态列表的构造: [f(x) for x in iterable]被称为列表理解

You could just use a default dictionary to store your data: 您可以使用默认字典来存储数据:

import collections

dictionary={'ab':2, 'bc':3, 'cd':1, 'de':1, 'ef':3, 'fg':1, 'gh':2}
lists_by_frequency=collections.defaultdict(list)
for s, f in dictionary.iteritems():
        lists_by_frequency[f].append(s)
list_of_lists=[[] for i in xrange(max(lists_by_frequency)+1)]
for f, v in lists_by_frequency.iteritems():
        list_of_lists[f]=v
print lists_by_frequency
print list_of_lists

Output: 输出:

defaultdict(<type 'list'>, {1: ['de', 'cd', 'fg'], 2: ['ab', 'gh'], 3: ['ef', 'bc']})
[[], ['de', 'cd', 'fg'], ['ab', 'gh'], ['ef', 'bc']]

As you can see, each group is stored at the index of their frequency. 如您所见,每个组都存储在其频率的索引处。 If the frequency is at least one you might just subtract one from the final result so you don't get an empty list at offset zero. 如果频率至少为1,则可能只从最终结果中减去1,因此您不会在偏移零处获得空列表。

The functional way: 功能方式:

import collections

dictionary = {'ab':2, 'bc':3, 'cd':1, 'de':1, 'ef':3, 'fg':1, 'gh':2}

ldict = collections.defaultdict(list)
map(lambda (k, v): ldict[v].append(k), dictionary.iteritems())
list_of_lists = map(lambda x: ldict[x], xrange(0, max(ldict)+1))

print(list_of_lists)

This solution uses the same methodology as the solution from hochl. 该解决方案使用与hochl解决方案相同的方法。 It is functional: therefore it is shorter - but it takes typically longer to understand it. 它是功能性的:因此它更短 - 但理解它通常需要更长的时间。 :-) :-)

Comment: It is that 'long' because IMHO the dict / defaultdict constructor is (for this use) too limited. 评论:这是'长'因为恕我直言,dict / defaultdict构造函数(对于这个用途)太有限了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM