简体   繁体   中英

How to sum values for the same key

I have a file

gu|8
gt|5
gr|5
gp|1
uk|2
gr|20
gp|98
uk|1
me|2
support|6

And I want to have one number per TLD like:

 gr|25
 gp|99
 uk|3
 me|2
 support|6
 gu|8
 gt|5

and here is my code:

f = open(file,'r')
d={}
for line in f:
    line = line.strip('\n')
    TLD,count = line.split('|')
    d[TLD] = d.get(TLD)+count

print d

But I get this error:

    d[TLD] = d.get(TLD)+count
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

Can anybody help?

Taking a look at the full traceback:

Traceback (most recent call last):
  File "mee.py", line 6, in <module>
    d[TLD] = d.get(TLD) + count
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

The error is telling us that we tried to add something of type NoneType to something of type str , which isn't allowed in Python.

There's only one object of type NoneType , which, unsurprisingly, is None – so we know that we tried to add a string to None .

The two things we tried to add together in that line were d.get(TLD) and count , and looking at the documentation for dict.get() , we see that what it does is

Return the value for key if key is in the dictionary, else default . If default is not given, it defaults to None , so that this method never raises a KeyError .

Since we didn't supply a default , d.get(TLD) returned None when it didn't find TLD in the dictionary, and we got the error attempting to add count to it. So, let's supply a default of 0 and see what happens:

f = open('data','r')
d={}
for line in f:
    line = line.strip('\n')
    TLD, count = line.split('|')
    d[TLD] = d.get(TLD, 0) + count

print d
$ python mee.py
Traceback (most recent call last):
  File "mee.py", line 6, in <module>
    d[TLD] = d.get(TLD, 0) + count
TypeError: unsupported operand type(s) for +: 'int' and 'str'

Well, we've still got an error, but now the problem is that we're trying to add a string to an integer, which is also not allowed, because it would be ambiguous .

That's happening because line.split('|') returns a list of strings – so we need to explicitly convert count to an integer:

f = open('data','r')
d={}
for line in f:
    line = line.strip('\n')
    TLD, count = line.split('|')
    d[TLD] = d.get(TLD, 0) + int(count)

print d

... and now it works:

$ python mee.py 
{'me': 2, 'gu': 8, 'gt': 5, 'gr': 25, 'gp': 99, 'support': 6, 'uk': 3}

Turning that dictionary back into the file output you want is a separate issue (and not attempted by your code), so I'll leave you to work on that.

To answer the title of your question: "how to sum values for the same key" - well, there is the builtin class called collections.Counter that is a perfect match for you:

import collections
d = collections.Counter()
with open(file) as f:
    tld, cnt = line.strip().split('|')
    d[tld] += int(cnt)

then to write back:

with open(file, 'w') as f:
    for tld, cnt in sorted(d.items()):
        print >> f, "%s|%d" % (tld, cnt)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM