简体   繁体   中英

How can I count different values per same key with Python?

I have a code which is able to give me the list like this:

  Name  id number week number
    Piata   4            6    
    Mali    2          20,5    
    Goerge  5           4    
    Gooki   3         24,64,6   
    Mali    5          45,9
    Piata   6           1    
    Piata  12          2,7,8,27,16 etc..

with the below code:

import csv
from datetime import date

datedict = defaultdict(set)
with open('d:/info.csv', 'r') as csvfile:
    filereader = csv.reader(csvfile, 'excel')
    #passing the header
    read_header = False
    start_date=date(year=2009,month=1,day=1)
    #print((seen_date - start_date).days)
    tdic = {}
    for row in filereader: 
        if not read_header:
            read_header = True
            continue

    # reading the rest rows
        name,id,firstseen = row[0],row[1],row[3]
        try:
            seen_date = datetime.datetime.strptime(firstseen, '%d/%m/%Y').date()               
            deltadays = (seen_date-start_date).days
            deltaweeks = deltadays/7 + 1
            key = name,id
            currentvalue = tdic.get(key, set())
            currentvalue.add(deltaweeks)
            tdic[key] = currentvalue

        except ValueError:
            print('Date value error')
            pass

Right now I want to convert my list to a list that give me number of ids for each name and its weeks numbers like the below list:

Name     number of ids      weeknumbers
Mali         2                20,5,45,9
Piata        3               1,6,2,7,8,27,16
Goerge       1                   4
Gooki        1                 24,64,6

Can anyone help me with writing the code for this part?

givent that :

tdict = {('Mali', 5): set([9, 45]), ('Gooki', 3): set([24, 64, 6]), ('Goerge', 5): set([4]), ('Mali', 2): set([20, 5]), ('Piata', 4): set([4]), ('Piata', 6): set([1]), ('Piata', 12): set([8, 16, 2, 27, 7])}

then to output the result above:

names = {}
for ((name, id), more_weeks) in tdict.items():
  (ids, weeks) = names.get(name, (0, set()))
  ids = ids + 1
  weeks = weeks.union(more_weeks)
  names[name] = (ids, weeks)

for (name, (id, weeks)) in names.items():
  print("%s, %s, %s" % (name, id, weeks)

Since it looks like your csv file has headers (which you are currently ignoring) why not use a DictReader instead of the standard reader class? If you don't supply fieldnames the DictReader will assume the first line contains them, which will also save you from having to skip the first line in your loop.

This seems like a great opportunity to use defaultdict and Counter from the collections module.

import csv
from datetime import date
from collections import defaultdict, Counter


datedict = defaultdict(set)
namecounter = Counter()
with open('d:/info.csv', 'r') as csvfile:
    filereader = csv.DictReader(csvfile)
    start_date=date(year=2009,month=1,day=1)

    for row in filereader: 
        name,id,firstseen = row['name'], row['id'], row['firstseen']

        try:
            seen_date = datetime.datetime.strptime(firstseen, '%d/%m/%Y').date() 
        except ValueError:
            print('Date value error')
            pass

        deltadays = (seen_date-start_date).days
        deltaweeks = deltadays/7 + 1

        datedict[name].add(deltaweeks)
        namecounter.update([name])  # Without putting name into a list, update will index each character

This assumes that (name, id) is unique. If this is not the case then you can use another defaultdict for namecounter . I've also moved the try-except statement so it is more explicit in what you are testing.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM