简体   繁体   中英

Creating a dictionary of frequencies from list of tuples

I've got

d = [(4, 1), (4, 1), (4, 1), (4, 1), (4, 3), (4, 2), (4, 2), (4, 4), (4, 1), (4, 3), (4, 1), (4, 1), (4, 2), (4, 1)] 

but many times larger.

the first number in each tuple is the month and the second is the number of incidents. I need to add up the number of incidents for each month and compile the total number of incidents for each month. so far i have:

def histogram(L):
    y = {}
    for x in L:
        if x[0] in y.keys():
            y[x] = y[x] + x[1]
        else:
            y[x] = x[1]
    return y

I'm needing an output similar to y= {4=24} (it doesnt have to be a dictionary) but with a range of numbers as list d is quite extensive

current output is

{(4, 2): 2, (4, 4): 4, (4, 1): 1, (4, 3): 3}

thanks

You could use Counter . I added some extra data to your example also.

d = [(4, 1), (4, 1), (4, 1), (4, 1), (4, 3), (4, 2), (4, 2), (4, 4), (4, 1), (4, 3), (4, 1), (4, 1), (4, 2), (4, 1), (5,1), (5,2)]

from collections import Counter

counter = Counter()

for x, y in d:
    counter[x]+=y

then counter == Counter({4: 49, 5: 3})

You can use itertools.groupby with a dict-comprehension here(considering data is sorted by months):

>>> from operator import itemgetter
>>> from itertools import groupby
>>> {k: sum(x for _, x in g) for k, g in groupby(d, key=itemgetter(0))}
{4: 24}

To improve your your code first thing you should do is to remove the .keys() call(though it is not going to matter much here as we can only have 12 months) because simple key in dct searches the key in O(1) time. Another problem is that you're using x as key, but you're supposed to use x[1] as key:

def histogram(L):
    y = {}
    for m, c in L:            #take the advantage of tuple unpacking
        y[m] = y.get(m, 0) + c

If you're sure you're always going to need all 12 months in your dict, then initialize all the months first:

def histogram(L):
    y = dict.fromkeys(range(1, 13), 0)
    for m, c in L:          
        y[m] += c

This should solve it.

d = [(4, 1), (4, 1), (4, 1), (4, 1), (4, 3), (4, 2), (4, 2), (4, 4), (4, 1), (4, 3), (4, 1), (4, 1), (4, 2), (4, 1)]

def histogram(L): y = {} for t in L:

month = t[0]
freq = t[1]
try :
  y[month] += freq
except KeyError:
  y[month] = 0
  y[month] += freq

return y

print(histogram(d))

I changed a bit the name of your variables

incidents = [(4, 1), (4, 1), (4, 1), (4, 1),
             (4, 3), (4, 2), (4, 2), (4, 4),
             (4, 1), (4, 3), (4, 1), (4, 1),
             (4, 2), (4, 1)]
inc_by_m = {}
for m, n in incidents:
    inc_by_m[m] = inc_by_m.get(m,0)+n
print inc_by_m
# {4:24}

the straightforward code is based on the optional argument (here 0 ) to the .get() method of a dictionary, get returns either the value indexed by the mandatory argument if it was previously set, or the optional argument if it was not.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM