根据键平均字典中的值

Question

我是Python新手，我有一组如下的值：

(3, '655')
(3, '645')
(3, '641')
(4, '602')
(4, '674')
(4, '620')

这是使用以下代码（python 2.6）从CSV文件生成的：

import csv
import time

with open('file.csv', 'rb') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        date = time.strptime(row[3], "%a %b %d %H:%M:%S %Z %Y")
        data = date, row[5]

        month = data[0][1]
        avg = data[1]
        monthAvg = month, avg
        print monthAvg

我想要做的是根据键获得平均值：

(3, 647)
(4, 632)

我最初的想法是创建一个新词典。

loop through the original dictionary
    if the key does not exist
        add the key and value to the new dictionary
    else
        sum the value to the existing value in the new dictionary

我还必须保持按键数量的计数，这样才能产生平均值。 看起来好像很多工作 - 我不确定是否有更优雅的方法来实现这一目标。

谢谢。

Answer 1

您可以使用collections.defaultdict创建包含唯一键和值列表的字典：

>>> l=[(3, '655'),(3, '645'),(3, '641'),(4, '602'),(4, '674'),(4, '620')]
>>> from collections import defaultdict
>>> d=defaultdict(list)
>>> 
>>> for i,j in l:
...    d[i].append(int(j))
... 
>>> d
defaultdict(<type 'list'>, {3: [655, 645, 641], 4: [602, 674, 620]})

然后使用列表推导来创建预期的对：

>>> [(i,sum(j)/len(j)) for i,j in d.items()]
[(3, 647), (4, 632)]

在你的代码中你可以做到：

with open('file.csv', 'rb') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        date = time.strptime(row[3], "%a %b %d %H:%M:%S %Z %Y")
        data = date, row[5]

        month = data[0][1]
        avg = data[1]
        d[month].append(int(avg))

     print [(i,sum(j)/len(j)) for i,j in d.items()]

Answer 2

使用pandas ，它专门用于做这些事情，这意味着你只需要用少量代码表达它们（你想要做的就是单行）。 此外，当给出大量值时，它将比任何其他方法快得多。

import pandas as pd

a=[(3, '655'),
   (3, '645'),
   (3, '641'),
   (4, '602'),
   (4, '674'),
   (4, '620')]

res = pd.DataFrame(a).astype('float').groupby(0).mean()
print(res)

得到：

这是一个多行版本，显示了会发生什么：

df = pd.DataFrame(a)  # construct a structure containing data
df = df.astype('float')  # convert data to float values
grp = df.groupby(0)  # group the values by the value in the first column
df = grp.mean()  # take the mean of each group

此外，如果你想使用csv文件，它更容易，因为你不需要自己解析csv文件（我使用我不知道的列的csv名称）：

import pandas as pd
df = pd.read_csv('file.csv', columns=['col0', 'col1', 'col2', 'date', 'col4', 'data'], index=False, header=None)
df['month'] = pd.DatetimeIndex(df['date']).month
df = df.loc[:,('month', 'data')].groupby('month').mean()

Answer 3

使用字典理解，其中元组对列表中的items ：

data = {i:[int(b) for a, b in items if a == i] for i in set(a for a, b in items)}
data = {a:int(float(sum(b))/float(len(b))) for a, b in data.items()} # averages

Answer 4

import itertools,csv
from dateutil.parser import parse as dparse

def make_tuples(fname='file.csv'):
    with open(fname, 'rb') as csvfile:
        rows = list(csv.reader(csvfile))
        for month,data in itertools.groupby(rows,lambda x:dparse(x[3]).strftime("%b")):
             data = zip(*data)
             yield (month,sum(data[5])/float(len(data[5])))

print dict(make_tuples('some_csv.csv'))

是一种方法来做到这一点......

根据键平均字典中的值

问题描述

4 个解决方案

解决方案1
4 已采纳 2015-04-10 15:45:21

解决方案2
2 2015-04-10 16:00:54

解决方案3
1 2015-04-10 15:53:32

解决方案4
1 2015-04-10 15:58:00

根据键平均字典中的值

问题描述

4 个解决方案

解决方案1 4 已采纳 2015-04-10 15:45:21

解决方案2 2 2015-04-10 16:00:54

解决方案3 1 2015-04-10 15:53:32

解决方案4 1 2015-04-10 15:58:00

解决方案1
4 已采纳 2015-04-10 15:45:21

解决方案2
2 2015-04-10 16:00:54

解决方案3
1 2015-04-10 15:53:32

解决方案4
1 2015-04-10 15:58:00