[英]Average values in nested dictionary
I would like to create a new list of values, my_qty
where each item is equal to the average of all values in d[key]['qty']
where d[key]['start date']
matches a value in my_dates
. 我想创建一个新的值列表
my_qty
,其中每个项目等于d[key]['qty']
中所有值的平均值,其中d[key]['start date']
与my_dates
中的值匹配。 I think I am close, but am getting hung up on the nested portion. 我想我已经很近了,但是正挂在嵌套部分上。
import datetime
import numpy as np
my_dates = [datetime.datetime(2014, 10, 12, 0, 0), datetime.datetime(2014, 10, 13, 0, 0), datetime.datetime(2014, 10, 14, 0, 0)]
d = {
'ID1' : {'start date': datetime.datetime(2014, 10, 12, 0, 0) , 'qty': 12},
'ID2' : {'start date': datetime.datetime(2014, 10, 13, 0, 0) , 'qty': 34},
'ID3' : {'start date': datetime.datetime(2014, 10, 12, 0, 0) , 'qty': 35},
'ID4' : {'start date': datetime.datetime(2014, 10, 11, 0, 0) , 'qty': 40},
}
my_qty = []
for item in my_dates:
my_qty.append([np.mean(x for x in d[key]['qty']) if d[key]['start date'] == my_dates[item]])
print my_qty
Desired Output: 所需输出:
[23.5,34,0]
To clarify the output per request: 要澄清每个请求的输出:
[average of d[key]['qty'] where d[key]['start date '] == my_dates[0], average of d[key]['qty'] where d[key]['start date '] == my_dates[1], average of d[key]['qty'] where d[key]['start date '] == my_dates[2],]
The simple way is to group the quantities by date into a dictionary: 简单的方法是按日期将数量分组为字典:
import collections
quantities = collections.defaultdict(lambda: [])
for k,v in d.iteritems():
quantities[v["start date"]].append(v["qty"])
Then run over that dictionary to compute the means: 然后在该字典上运行以计算均值:
means = {k: float(sum(q))/len(q) for k,q in quantities.iteritems()}
Giving: 给予:
>>> means
{datetime.datetime(2014, 10, 11, 0, 0): 40.0,
datetime.datetime(2014, 10, 12, 0, 0): 23.5,
datetime.datetime(2014, 10, 13, 0, 0): 34.0}
If you wanted to be clever, it's possible to compute the mean in a single pass by keeping the current mean and the tally of the number of values you've seen. 如果您想变得聪明,可以通过保持当前均值和所见值的数量相加来一次计算均值。 You can even abstract this in a class:
您甚至可以在一个类中抽象它:
class RunningMean(object):
def __init__(self, mean=None, n=0):
self.mean = mean
self.n = n
def insert(self, other):
if self.mean is None:
self.mean = 0.0
self.mean = (self.mean * self.n + other) / (self.n + 1)
self.n += 1
def __repr__(self):
args = (self.__class__.__name__, self.mean, self.n)
return "{}(mean={}, n={})".format(*args)
And one pass through your data will give you your answer: 一遍您的数据将给您答案:
import collections
means = collections.defaultdict(lambda: RunningMean())
for k,v in d.iteritems():
means[v["start date"]].insert(v["qty"])
The really simple way is to use the pandas
library, as it was made for things like this. 真正简单的方法是使用
pandas
库,因为它是为类似这样的事情制作的。 Here's some code: 这是一些代码:
import pandas as pd
df = pd.DataFrame.from_dict(d, orient="index")
means = df.groupby("start date").aggregate(np.mean)
Giving: 给予:
>>> means
qty
start date
2014-10-11 40.0
2014-10-12 23.5
2014-10-13 34.0
The one line answer: 单行答案:
mean_qty = [np.mean([i['qty'] for i in d.values()\
if i.get('start date') == day] or 0) for day in my_dates]
In [12]: mean_qty
Out[12]: [23.5, 34.0, 0.0]
The purpose of or 0
is to return 0 as the OP wanted if there are no qty
since np.mean on an empty list returns nan
by default. or 0
的目的是,如果没有qty
,则返回0作为OP所需的值,因为空列表上的np.mean默认情况下返回nan
。
If you need speed, then building on jme's excellent second part, you can do this (I cut his time down by 3x by not recalculating the mean until it's called for): 如果您需要速度,那么可以在jme的出色第二部分上进行构建(您可以这样做(我不重新计算均值直到被要求)将他的时间减少了3倍):
class RunningMean(object):
def __init__(self, total=0.0, n=0):
self.total=total
self.n = n
def __iadd__(self, other):
self.total += other
self.n += 1
return self
def mean(self):
return (self.total/self.n if self.n else 0)
def __repr__(self):
return "RunningMean(total=%f, n=%i)" %(self.total, self.n)
means = defaultdict(RunningMean)
for v in d.values():
means[v["start date"]] += (v["qty"])
Out[351]:
[RunningMean(mean= 40.000000),
RunningMean(mean= 34.000000),
RunningMean(mean= 23.500000)]
Here is some working code which should help you: 这是一些可以帮助您的工作代码:
for item in my_dates:
nums = [ d[key]['qty'] for key in d if d[key]['start date'] == item ]
if len(nums):
avg = np.mean(nums)
else:
avg = 0
print item, nums, avg
Note that np.mean
doesn't work on an empty list, so you have to check the length of the numbers you want to average. 请注意,
np.mean
不适用于空白列表,因此您必须检查要平均的数字的长度。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.