简体   繁体   中英

Building a list of lists from a frequency dictionary in Python

I need help finding a shortcut to build a frequency sorted list of lists from a frequency dictionary. I am able to build a list of lists (see below) by appending each element to a list and then appending each list to the 'list of lists' (easy with only frequencies 1-3), but what happens if I have frequencies up to 100 or more?! There has to be a better way.

dictionary = {'ab':2, 'bc':3, 'cd':1, 'de':1, 'ef':3, 'fg':1, 'gh':2}
list_1 = []
list_2 = []
list_3 = []
list_of_lists = []

for key, value in dictionary.items():
    if value == 1:
            list_1.append(key)
for key, value in dictionary.items():
    if value == 2:
            list_2.append(key)
for key, value in dictionary.items():
    if value == 3:
            list_3.append(key)

list_of_lists.append(list_1)
list_of_lists.append(list_2)
list_of_lists.append(list_3)

print list_of_lists

copy of run in Python looks like this:

[['de', 'cd', 'fg'], ['ab', 'gh'], ['ef', 'bc']]

This is exactly what I want, but it won't work for a corpus of 100,000+ words with frequencies of 100+. Please help me find a better, less tedious way of building my list of lists.


solution 1 - Inverse-mapping via list-of-lists (what was asked for)

You are looking for something like a histogram, but the inverse.

def inverseHistogram(valueFreqPairs):
    maxFreq = max(p[1] for p in valueFreqPairs)+1
    R = [[] for _ in range(maxFreq)]
    for value,freq in valueFreqPairs:
        R[freq] += [value]
    return R

Demo:

>>> inverseHistogram(dictionary.items())
[[], ['de', 'cd', 'fg'], ['ab', 'gh'], ['ef', 'bc']]

solution 2 - Inverse-mapping via defaultdict pattern (much cleaner)

Even better if you are content with using a dictionary to organize the inverse (which seems more elegant). This is how I'd personally do it.

reverseDict = collections.defaultdict(list)
for value,freq in dictionary.items():
    reverseDict[freq].append(value)

Demo:

>>> dict(reverseDict)
{1: ['de', 'cd', 'fg'], 2: ['ab', 'gh'], 3: ['ef', 'bc']}

sidenote: This will also save you space if for example your frequencies are sparse, eg if your input was {'onlyitem':999999999} , then you avoid having to make a list larger than your memory, thereby locking your machine up.

Best way: throw them all into a dict

result = {}

for key, value in dictionary.iteritems():
  if not value in result:
    result[value] = []
  result[value].append(key)

Slightly simpler:

from collections import defaultdict
result = defaultdict(list)

for key, value in dictionary.iteritems():
  result[value].append(key)

Or to create a list:

result = [[]] * max(dictionary.values())

for key, value in dictionary.iteritems():
  result[value-1].append(key)
dict_of_lists = {}

for key, value in dictionary.items():
    if value in dict_of_lists:
        dict_of_lists[value].append(key)
    else:
        dict_of_lists[value] = [key]

list_of_lists = dict_of_lists.values()

You could do simple something like that:

dictionary = {'a1':2, ..., 'g':100}
MAX_FREQUENCE = max([dictionary[k] for k in dictionary]) //find the max frequency
list_of_lists=[[] for x in range(MAX_FREQUENCE] //generate empty list of lists
for k in dictionary:  
    dictionary[d[k]-1].append(k)

The -1 since the list_of_lists starts at 0. The construction of list on the fly : [f(x) for x in iterable] is called a list comprehension .

You could just use a default dictionary to store your data:

import collections

dictionary={'ab':2, 'bc':3, 'cd':1, 'de':1, 'ef':3, 'fg':1, 'gh':2}
lists_by_frequency=collections.defaultdict(list)
for s, f in dictionary.iteritems():
        lists_by_frequency[f].append(s)
list_of_lists=[[] for i in xrange(max(lists_by_frequency)+1)]
for f, v in lists_by_frequency.iteritems():
        list_of_lists[f]=v
print lists_by_frequency
print list_of_lists

Output:

defaultdict(<type 'list'>, {1: ['de', 'cd', 'fg'], 2: ['ab', 'gh'], 3: ['ef', 'bc']})
[[], ['de', 'cd', 'fg'], ['ab', 'gh'], ['ef', 'bc']]

As you can see, each group is stored at the index of their frequency. If the frequency is at least one you might just subtract one from the final result so you don't get an empty list at offset zero.

The functional way:

import collections

dictionary = {'ab':2, 'bc':3, 'cd':1, 'de':1, 'ef':3, 'fg':1, 'gh':2}

ldict = collections.defaultdict(list)
map(lambda (k, v): ldict[v].append(k), dictionary.iteritems())
list_of_lists = map(lambda x: ldict[x], xrange(0, max(ldict)+1))

print(list_of_lists)

This solution uses the same methodology as the solution from hochl. It is functional: therefore it is shorter - but it takes typically longer to understand it. :-)

Comment: It is that 'long' because IMHO the dict / defaultdict constructor is (for this use) too limited.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM