I have a dictionary of dictionaries where the keys of the inner dictionary represents bins of a histogram and the values represent the frequency. I want to calculate the mean bin and the std of the bins.
dict = {'Group 1' : {1 : 100, 2:300, 4:100, 5:50},
{'Group 2' : {1 : 50, 2: 300},
{'Group 3' : {4 : 100, 5: 200},
...}
Example For Group 1
I want to get the mean and std identical to taking the mean and std of a list of 100 1's, 300 2's, 100 4's, 50 5's
l = []
l.extend([1 for j in range(0,100)])
l.extend([2 for j in range(0,300)])
l.extend([4 for j in range(0,100)])
l.extend([5 for j in range(0,50)])
np.mean(l) = 2.45
np.std(l) = 1.23
What would be the best way to iterate over each dictionary and transform it such that I get a dictionary of dictionaries representing the mean and std of the bins of the inner dictionaries?
transformed_dictionary = {'Group 1' : {'mean': 2.45 , 'std' : 1.23},
'Group 2' : {...},
...}
What could be an efficient way of doing this?
First, you should not name your dictionary dict
, which will mask the builtin class named dict
. Second, your declaration of dict
is not quite correct (and it is not a "dictionary of dictionaries" -- it is a dictionary whose values are dictionaries).
import numpy as np
d = {'Group 1' : {1 : 100, 2:300, 4:100, 5:50},
'Group 2' : {1 : 50, 2: 300},
'Group 3' : {4 : 100, 5: 200}
}
transformed_dictionary = {}
for k, v in d.items():
l = []
for item in v.items():
l.extend([item[0] for j in range(item[1])])
transformed_dictionary[k] = {'mean': np.mean(l), 'std': np.std(l)}
print(transformed_dictionary)
Prints:
{'Group 1': {'mean': 2.4545454545454546, 'std': 1.233150906022776}, 'Group 2': {'mean': 1.8571428571428572, 'std': 0.34992710611188255}, 'Group 3': {'mean': 4.666666666666667, 'std': 0.4714045207910316}}
To avoid building auxiliary list, you can use np.average
with weights=
parameter:
def weighted_avg_and_std(values, weights):
"""
Return the weighted average and standard deviation.
values, weights -- Numpy ndarrays with the same shape.
"""
average = np.average(values, weights=weights)
# Fast and numerically precise:
variance = np.average((values-average)**2, weights=weights)
return average, np.sqrt(variance)
d = {'Group 1' : {1 : 100, 2:300, 4:100, 5:50},
'Group 2' : {1 : 50, 2: 300},
'Group 3' : {4 : 100, 5: 200}}
out = {}
for k, v in d.items():
m, s = weighted_avg_and_std([*v], [*v.values()])
out[k] = {
'mean': m,
'std': s
}
print(out)
Prints:
{'Group 1': {'mean': 2.4545454545454546, 'std': 1.2331509060227759},
'Group 2': {'mean': 1.8571428571428572, 'std': 0.3499271061118826},
'Group 3': {'mean': 4.666666666666667, 'std': 0.4714045207910317}}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.