I have a pandas Series and a pandas multiindex Dataframe.
Here is a simplistic example of the situation:
iterables = [['milk', 'honey', 'dates'], ['jan', 'feb', 'mar', 'apr']]
i = pd.MultiIndex.from_product(iterables, names=['good', 'month'])
xf = pd.DataFrame(index = i)
xf['price'] = np.random.randint(1, 25, xf.shape[0])
allocation_vector = pd.Series([0.3, 0.6, 0.1], index = ['milk', 'honey', 'dates'])
This dataframe represents 'price of three products in each month jan through apr' The allocation_vector represents some fractional share of prices.
What I want to achieve is multiplying the allocation vector times my dataframe resulting in a series with index 'jan', 'feb', 'mar', 'apr' and the value equaling the dotproduct in that month (IE: jan_date_price*date_pct + jan_milk_price*milk_pct + jan_honey_price*jan_pct
for each of jan, feb, mar, apr)
I've only been able to solve this with nasty iterative hacky solutions. I figure there must be a much more pythonic way to do this, and where I don't have to worry about vector columns being in the wrong order for the multiplication against the dataframe columns etc. Of course the actual dataframe has more columns that aren't involved in the calculation.
I believe you need multiple by first level by Series.mul
and then sum per first level:
np.random.seed(2019)
iterables = [['milk', 'honey', 'dates'], ['jan', 'feb', 'mar', 'apr']]
i = pd.MultiIndex.from_product(iterables, names=['good', 'month'])
xf = pd.DataFrame(index = i)
xf['price'] = np.random.randint(1, 25, xf.shape[0])
print (xf)
price
good month
milk jan 9
feb 19
mar 6
apr 23
honey jan 16
feb 13
mar 11
apr 17
dates jan 17
feb 8
mar 6
apr 20
allocation_vector = pd.Series([0.3, 0.6, 0.1], index = ['milk', 'honey', 'dates'])
print (17*0.1+9*0.3+16*0.6)
14.0
s = xf['price'].mul(allocation_vector, level=0).sum(level=1)
print (s)
month
jan 14.0
feb 14.3
mar 9.0
apr 19.1
dtype: float64
Or reshape by Series.unstack
, transpose and use DataFrame.dot
, but order of values in output is changed:
s = xf['price'].unstack().T.dot(allocation_vector)
print (s)
month
apr 19.1
feb 14.3
jan 14.0
mar 9.0
dtype: float64
You can achieve what you want using a combination of join
and groupby
as shown below:
allocation_vector.name = 'pct'
xf = xf.join(allocation_vector, on='good')
xf['dotproduct'] = xf.price * xf.pct
print(xf)
The resulting dataframe is:
price pct dotproduct
good month
milk jan 19 0.3 5.7
feb 8 0.3 2.4
mar 7 0.3 2.1
apr 15 0.3 4.5
honey jan 9 0.6 5.4
feb 10 0.6 6.0
mar 7 0.6 4.2
apr 11 0.6 6.6
dates jan 2 0.1 0.2
feb 14 0.1 1.4
mar 12 0.1 1.2
apr 7 0.1 0.7
And then you can get the result you need using:
print(xf.groupby('month')['dotproduct'].sum())
The output is:
month
apr 11.8
feb 9.8
jan 11.3
mar 7.5
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.