简体   繁体   English

嵌套字典中的平均值

[英]Average values in nested dictionary

I would like to create a new list of values, my_qty where each item is equal to the average of all values in d[key]['qty'] where d[key]['start date'] matches a value in my_dates . 我想创建一个新的值列表my_qty ,其中每个项目等于d[key]['qty']中所有值的平均值,其中d[key]['start date']my_dates中的值匹配。 I think I am close, but am getting hung up on the nested portion. 我想我已经很近了,但是正挂在嵌套部分上。

import datetime
import numpy as np
my_dates = [datetime.datetime(2014, 10, 12, 0, 0), datetime.datetime(2014, 10, 13, 0, 0), datetime.datetime(2014, 10, 14, 0, 0)]

d = {
    'ID1' : {'start date': datetime.datetime(2014, 10, 12, 0, 0) , 'qty': 12},
    'ID2' : {'start date': datetime.datetime(2014, 10, 13, 0, 0) , 'qty': 34},
    'ID3' : {'start date': datetime.datetime(2014, 10, 12, 0, 0) , 'qty': 35},
    'ID4' : {'start date': datetime.datetime(2014, 10, 11, 0, 0) , 'qty': 40},
}

my_qty = []
for item in my_dates:
  my_qty.append([np.mean(x for x in d[key]['qty']) if d[key]['start date'] == my_dates[item]])

print my_qty

Desired Output: 所需输出:

[23.5,34,0]

To clarify the output per request: 要澄清每个请求的输出:

[average of d[key]['qty'] where d[key]['start date '] == my_dates[0], average of d[key]['qty'] where d[key]['start date '] == my_dates[1], average of d[key]['qty'] where d[key]['start date '] == my_dates[2],]

With pure python 用纯python

The simple way is to group the quantities by date into a dictionary: 简单的方法是按日期将数量分组为字典:

import collections

quantities = collections.defaultdict(lambda: [])

for k,v in d.iteritems():
    quantities[v["start date"]].append(v["qty"])

Then run over that dictionary to compute the means: 然后在该字典上运行以计算均值:

means = {k: float(sum(q))/len(q) for k,q in quantities.iteritems()}

Giving: 给予:

>>> means
{datetime.datetime(2014, 10, 11, 0, 0): 40.0,
 datetime.datetime(2014, 10, 12, 0, 0): 23.5,
 datetime.datetime(2014, 10, 13, 0, 0): 34.0}

If you wanted to be clever, it's possible to compute the mean in a single pass by keeping the current mean and the tally of the number of values you've seen. 如果您想变得聪明,可以通过保持当前均值和所见值的数量相加来一次计算均值。 You can even abstract this in a class: 您甚至可以在一个类中抽象它:

class RunningMean(object):
    def __init__(self, mean=None, n=0):
        self.mean = mean
        self.n = n

    def insert(self, other):
        if self.mean is None:
            self.mean = 0.0
        self.mean = (self.mean * self.n + other) / (self.n + 1)
        self.n += 1

    def __repr__(self):
        args = (self.__class__.__name__, self.mean, self.n)
        return "{}(mean={}, n={})".format(*args)

And one pass through your data will give you your answer: 一遍您的数据将给您答案:

import collections
means = collections.defaultdict(lambda: RunningMean())
for k,v in d.iteritems():
    means[v["start date"]].insert(v["qty"])

With pandas 与熊猫

The really simple way is to use the pandas library, as it was made for things like this. 真正简单的方法是使用pandas库,因为它是为类似这样的事情制作的。 Here's some code: 这是一些代码:

import pandas as pd
df = pd.DataFrame.from_dict(d, orient="index")
means = df.groupby("start date").aggregate(np.mean)

Giving: 给予:

>>> means
             qty
start date      
2014-10-11  40.0
2014-10-12  23.5
2014-10-13  34.0

The one line answer: 单行答案:

mean_qty = [np.mean([i['qty'] for i in d.values()\
 if i.get('start date') == day] or 0) for day in my_dates] 

In [12]: mean_qty
Out[12]: [23.5, 34.0, 0.0]

The purpose of or 0 is to return 0 as the OP wanted if there are no qty since np.mean on an empty list returns nan by default. or 0的目的是,如果没有qty ,则返回0作为OP所需的值,因为空列表上的np.mean默认情况下返回nan

If you need speed, then building on jme's excellent second part, you can do this (I cut his time down by 3x by not recalculating the mean until it's called for): 如果您需要速度,那么可以在jme的出色第二部分上进行构建(您可以这样做(我不重新计算均值直到被要求)将他的时间减少了3倍):

class RunningMean(object):
    def __init__(self, total=0.0, n=0):
        self.total=total
        self.n = n

    def __iadd__(self, other):
        self.total += other
        self.n += 1
        return self

    def mean(self): 
        return (self.total/self.n if self.n else 0)

    def __repr__(self):
        return "RunningMean(total=%f, n=%i)" %(self.total, self.n)
means = defaultdict(RunningMean)
for v in d.values():
    means[v["start date"]] += (v["qty"])

Out[351]: 
[RunningMean(mean= 40.000000),
 RunningMean(mean= 34.000000),
 RunningMean(mean= 23.500000)]

Here is some working code which should help you: 这是一些可以帮助您的工作代码:

for item in my_dates:
  nums = [ d[key]['qty'] for key in d if d[key]['start date'] == item ]
  if len(nums):
    avg = np.mean(nums)
  else:
    avg = 0
  print item, nums, avg

Note that np.mean doesn't work on an empty list, so you have to check the length of the numbers you want to average. 请注意, np.mean不适用于空白列表,因此您必须检查要平均的数字的长度。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM