Say I have the following dataframe:
>>> df=pd.DataFrame({'category':['a','a','b','b'],
... 'var1':np.random.randint(0,100,4),
... 'var2':np.random.randint(0,100,4),
... 'weights':np.random.randint(0,10,4)})
>>> df
category var1 var2 weights
0 a 37 36 7
1 a 47 20 1
2 b 33 7 6
3 b 16 6 8
I can calculate the weighted average of a 'var1' as such:
>>> Grouped=df.groupby('category')
>>> GetWeightAvg=lambda g: np.average(g['var1'], weights=g['weights'])
>>> Grouped.apply(GetWeightAvg)
category
a 38.250000
b 23.285714
dtype: float64
However I am wondering if there is a way I can write my function and apply it to my grouped object such that I can specify when applying it, which column I want to calculate for (or both). Rather than have 'var1' written into my function, I'd like to be able to specify when applying the function.
Just as I can get an unweighted average of both columns like this:
>>> Grouped[['var1','var2']].mean()
var1 var2
category
a 42.0 28.0
b 24.5 6.5
I'm wondering if there is a parallel way to do that with weighted averages.
You can apply and return both averages:
In [11]: g.apply(lambda x: pd.Series(np.average(x[["var1", "var2"]], weights=x["weights"], axis=0), ["var1", "var2"]))
Out[11]:
var1 var2
category
a 38.250000 34.000000
b 23.285714 6.428571
You could write this slightly cleaner as a function:
In [21]: def weighted(x, cols, w="weights"):
return pd.Series(np.average(x[cols], weights=x[w], axis=0), cols)
In [22]: g.apply(weighted, ["var1", "var2"])
Out[22]:
var1 var2
category
a 38.250000 34.000000
b 23.285714 6.428571
Following up from Andy's solution, I was seeking to use one of the index levels from a multi index as my weights.
np.random.seed(1)
arrays = [list('AAABBB'), [0.01,0.02,0.03,0.07,0.09,0.11]]
tups = list(zip(*arrays))
x = pd.MultiIndex.from_tuples(tups)
df = pd.DataFrame(index=x,data= np.random.randint(10,100,(6,6)),columns = list('STUVWX'))
df.index.names = ['bin','prob']
S T U V W X
bin prob
A 0.0100 47 22 82 19 85 15
0.0200 89 74 26 11 86 81
0.0300 16 35 60 30 28 94
B 0.0700 21 38 39 24 60 78
0.0900 97 97 96 23 19 17
0.1100 73 71 32 67 11 10
Adapting function to use one of index levels as the weights.
def weighted(x, w="weights"):
return pd.Series(np.average(x, weights=x.index.get_level_values(w), axis=0),index= x.columns)
and calling
df.groupby(level=['bin']).apply(weighted, "prob")
which gives:
S T U V W X
bin
A 45.5000 45.8333 52.3333 21.8333 56.8333 76.5000
B 67.5185 71.1111 55.1481 41.1852 26.3704 29.9630
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.