简体   繁体   中英

Average values in nested dictionary

I would like to create a new list of values, my_qty where each item is equal to the average of all values in d[key]['qty'] where d[key]['start date'] matches a value in my_dates . I think I am close, but am getting hung up on the nested portion.

import datetime
import numpy as np
my_dates = [datetime.datetime(2014, 10, 12, 0, 0), datetime.datetime(2014, 10, 13, 0, 0), datetime.datetime(2014, 10, 14, 0, 0)]

d = {
    'ID1' : {'start date': datetime.datetime(2014, 10, 12, 0, 0) , 'qty': 12},
    'ID2' : {'start date': datetime.datetime(2014, 10, 13, 0, 0) , 'qty': 34},
    'ID3' : {'start date': datetime.datetime(2014, 10, 12, 0, 0) , 'qty': 35},
    'ID4' : {'start date': datetime.datetime(2014, 10, 11, 0, 0) , 'qty': 40},
}

my_qty = []
for item in my_dates:
  my_qty.append([np.mean(x for x in d[key]['qty']) if d[key]['start date'] == my_dates[item]])

print my_qty

Desired Output:

[23.5,34,0]

To clarify the output per request:

[average of d[key]['qty'] where d[key]['start date '] == my_dates[0], average of d[key]['qty'] where d[key]['start date '] == my_dates[1], average of d[key]['qty'] where d[key]['start date '] == my_dates[2],]

With pure python

The simple way is to group the quantities by date into a dictionary:

import collections

quantities = collections.defaultdict(lambda: [])

for k,v in d.iteritems():
    quantities[v["start date"]].append(v["qty"])

Then run over that dictionary to compute the means:

means = {k: float(sum(q))/len(q) for k,q in quantities.iteritems()}

Giving:

>>> means
{datetime.datetime(2014, 10, 11, 0, 0): 40.0,
 datetime.datetime(2014, 10, 12, 0, 0): 23.5,
 datetime.datetime(2014, 10, 13, 0, 0): 34.0}

If you wanted to be clever, it's possible to compute the mean in a single pass by keeping the current mean and the tally of the number of values you've seen. You can even abstract this in a class:

class RunningMean(object):
    def __init__(self, mean=None, n=0):
        self.mean = mean
        self.n = n

    def insert(self, other):
        if self.mean is None:
            self.mean = 0.0
        self.mean = (self.mean * self.n + other) / (self.n + 1)
        self.n += 1

    def __repr__(self):
        args = (self.__class__.__name__, self.mean, self.n)
        return "{}(mean={}, n={})".format(*args)

And one pass through your data will give you your answer:

import collections
means = collections.defaultdict(lambda: RunningMean())
for k,v in d.iteritems():
    means[v["start date"]].insert(v["qty"])

With pandas

The really simple way is to use the pandas library, as it was made for things like this. Here's some code:

import pandas as pd
df = pd.DataFrame.from_dict(d, orient="index")
means = df.groupby("start date").aggregate(np.mean)

Giving:

>>> means
             qty
start date      
2014-10-11  40.0
2014-10-12  23.5
2014-10-13  34.0

The one line answer:

mean_qty = [np.mean([i['qty'] for i in d.values()\
 if i.get('start date') == day] or 0) for day in my_dates] 

In [12]: mean_qty
Out[12]: [23.5, 34.0, 0.0]

The purpose of or 0 is to return 0 as the OP wanted if there are no qty since np.mean on an empty list returns nan by default.

If you need speed, then building on jme's excellent second part, you can do this (I cut his time down by 3x by not recalculating the mean until it's called for):

class RunningMean(object):
    def __init__(self, total=0.0, n=0):
        self.total=total
        self.n = n

    def __iadd__(self, other):
        self.total += other
        self.n += 1
        return self

    def mean(self): 
        return (self.total/self.n if self.n else 0)

    def __repr__(self):
        return "RunningMean(total=%f, n=%i)" %(self.total, self.n)
means = defaultdict(RunningMean)
for v in d.values():
    means[v["start date"]] += (v["qty"])

Out[351]: 
[RunningMean(mean= 40.000000),
 RunningMean(mean= 34.000000),
 RunningMean(mean= 23.500000)]

Here is some working code which should help you:

for item in my_dates:
  nums = [ d[key]['qty'] for key in d if d[key]['start date'] == item ]
  if len(nums):
    avg = np.mean(nums)
  else:
    avg = 0
  print item, nums, avg

Note that np.mean doesn't work on an empty list, so you have to check the length of the numbers you want to average.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM