在Python中構建頻率字典中的列表列表

Question

我需要幫助找到一個快捷方式，從頻率字典中構建頻率排序的列表列表。 我可以通過將每個元素附加到列表然后將每個列表附加到“列表列表”（僅使用1-3頻率）來構建列表列表（參見下文），但如果我有頻率上升會發生什么到100或更多？！ 一定有更好的方法。

dictionary = {'ab':2, 'bc':3, 'cd':1, 'de':1, 'ef':3, 'fg':1, 'gh':2}
list_1 = []
list_2 = []
list_3 = []
list_of_lists = []

for key, value in dictionary.items():
    if value == 1:
            list_1.append(key)
for key, value in dictionary.items():
    if value == 2:
            list_2.append(key)
for key, value in dictionary.items():
    if value == 3:
            list_3.append(key)

list_of_lists.append(list_1)
list_of_lists.append(list_2)
list_of_lists.append(list_3)

print list_of_lists

Python中運行的副本如下所示：

[['de'，'cd'，'fg']，['ab'，'gh']，['ef'，'bc']]

這正是我想要的，但它不適用於頻率為100+的100,000多個單詞的語料庫。 請幫我找一個更好，更乏味的方法來構建我的列表列表。

Answer 1

解決方案1 - 通過列表列表進行反向映射（要求的內容）

你正在尋找類似直方圖的東西，但反過來。

def inverseHistogram(valueFreqPairs):
    maxFreq = max(p[1] for p in valueFreqPairs)+1
    R = [[] for _ in range(maxFreq)]
    for value,freq in valueFreqPairs:
        R[freq] += [value]
    return R

演示：

>>> inverseHistogram(dictionary.items())
[[], ['de', 'cd', 'fg'], ['ab', 'gh'], ['ef', 'bc']]

解決方案2 - 通過defaultdict模式進行反向映射（更清晰）

如果你滿足於使用字典來組織逆向（這似乎更優雅），那就更好了。 這就是我親自做的事情。

reverseDict = collections.defaultdict(list)
for value,freq in dictionary.items():
    reverseDict[freq].append(value)

演示：

>>> dict(reverseDict)
{1: ['de', 'cd', 'fg'], 2: ['ab', 'gh'], 3: ['ef', 'bc']}

旁注：如果您的頻率很稀疏，這也可以節省空間，例如，如果輸入是{'onlyitem':999999999} ，那么您可以避免使列表大於內存，從而鎖定您的機器。

Answer 2

最好的方法：將它們全部扔進dict

result = {}

for key, value in dictionary.iteritems():
  if not value in result:
    result[value] = []
  result[value].append(key)

稍微簡單一些：

from collections import defaultdict
result = defaultdict(list)

for key, value in dictionary.iteritems():
  result[value].append(key)

或者創建一個列表：

result = [[]] * max(dictionary.values())

for key, value in dictionary.iteritems():
  result[value-1].append(key)

Answer 3

dict_of_lists = {}

for key, value in dictionary.items():
    if value in dict_of_lists:
        dict_of_lists[value].append(key)
    else:
        dict_of_lists[value] = [key]

list_of_lists = dict_of_lists.values()

Answer 4

你可以做一些簡單的事情：

dictionary = {'a1':2, ..., 'g':100}
MAX_FREQUENCE = max([dictionary[k] for k in dictionary]) //find the max frequency
list_of_lists=[[] for x in range(MAX_FREQUENCE] //generate empty list of lists
for k in dictionary:  
    dictionary[d[k]-1].append(k)

自list_of_lists從0開始的-1 。動態列表的構造： [f(x) for x in iterable]被稱為列表理解。

Answer 5

您可以使用默認字典來存儲數據：

import collections

dictionary={'ab':2, 'bc':3, 'cd':1, 'de':1, 'ef':3, 'fg':1, 'gh':2}
lists_by_frequency=collections.defaultdict(list)
for s, f in dictionary.iteritems():
        lists_by_frequency[f].append(s)
list_of_lists=[[] for i in xrange(max(lists_by_frequency)+1)]
for f, v in lists_by_frequency.iteritems():
        list_of_lists[f]=v
print lists_by_frequency
print list_of_lists

輸出：

defaultdict(<type 'list'>, {1: ['de', 'cd', 'fg'], 2: ['ab', 'gh'], 3: ['ef', 'bc']})
[[], ['de', 'cd', 'fg'], ['ab', 'gh'], ['ef', 'bc']]

如您所見，每個組都存儲在其頻率的索引處。 如果頻率至少為1，則可能只從最終結果中減去1，因此您不會在偏移零處獲得空列表。

Answer 6

功能方式：

import collections

dictionary = {'ab':2, 'bc':3, 'cd':1, 'de':1, 'ef':3, 'fg':1, 'gh':2}

ldict = collections.defaultdict(list)
map(lambda (k, v): ldict[v].append(k), dictionary.iteritems())
list_of_lists = map(lambda x: ldict[x], xrange(0, max(ldict)+1))

print(list_of_lists)

該解決方案使用與hochl解決方案相同的方法。 它是功能性的：因此它更短 - 但理解它通常需要更長的時間。 :-)

評論：這是'長'因為恕我直言，dict / defaultdict構造函數（對於這個用途）太有限了。

在Python中構建頻率字典中的列表列表

問題描述

6 個解決方案

解決方案1
1 2012-03-14 00:12:59

解決方案2
0 2012-03-14 00:05:35

解決方案3
0 2012-03-14 00:07:49

解決方案4
0 2012-03-14 00:08:47

解決方案5
0 2012-03-14 00:18:57

解決方案6
0 2012-03-15 08:19:31

在Python中構建頻率字典中的列表列表

問題描述

6 個解決方案

解決方案1 1 2012-03-14 00:12:59

解決方案2 0 2012-03-14 00:05:35

解決方案3 0 2012-03-14 00:07:49

解決方案4 0 2012-03-14 00:08:47

解決方案5 0 2012-03-14 00:18:57

解決方案6 0 2012-03-15 08:19:31

解決方案1
1 2012-03-14 00:12:59

解決方案2
0 2012-03-14 00:05:35

解決方案3
0 2012-03-14 00:07:49

解決方案4
0 2012-03-14 00:08:47

解決方案5
0 2012-03-14 00:18:57

解決方案6
0 2012-03-15 08:19:31