简体   繁体   中英

Multiplying a pd.Series vector against a multindex pd.Dataframe

I have a pandas Series and a pandas multiindex Dataframe.

Here is a simplistic example of the situation:

iterables = [['milk', 'honey', 'dates'], ['jan', 'feb', 'mar', 'apr']]
i = pd.MultiIndex.from_product(iterables, names=['good', 'month'])
xf = pd.DataFrame(index = i)
xf['price'] = np.random.randint(1, 25, xf.shape[0])

allocation_vector = pd.Series([0.3, 0.6, 0.1], index = ['milk', 'honey', 'dates'])

This dataframe represents 'price of three products in each month jan through apr' The allocation_vector represents some fractional share of prices.

What I want to achieve is multiplying the allocation vector times my dataframe resulting in a series with index 'jan', 'feb', 'mar', 'apr' and the value equaling the dotproduct in that month (IE: jan_date_price*date_pct + jan_milk_price*milk_pct + jan_honey_price*jan_pct for each of jan, feb, mar, apr)

I've only been able to solve this with nasty iterative hacky solutions. I figure there must be a much more pythonic way to do this, and where I don't have to worry about vector columns being in the wrong order for the multiplication against the dataframe columns etc. Of course the actual dataframe has more columns that aren't involved in the calculation.

I believe you need multiple by first level by Series.mul and then sum per first level:

np.random.seed(2019)

iterables = [['milk', 'honey', 'dates'], ['jan', 'feb', 'mar', 'apr']]
i = pd.MultiIndex.from_product(iterables, names=['good', 'month'])
xf = pd.DataFrame(index = i)
xf['price'] = np.random.randint(1, 25, xf.shape[0])
print (xf)
             price
good  month       
milk  jan        9
      feb       19
      mar        6
      apr       23
honey jan       16
      feb       13
      mar       11
      apr       17
dates jan       17
      feb        8
      mar        6
      apr       20

allocation_vector = pd.Series([0.3, 0.6, 0.1], index = ['milk', 'honey', 'dates'])

print (17*0.1+9*0.3+16*0.6)
14.0

s = xf['price'].mul(allocation_vector, level=0).sum(level=1)
print (s)
month
jan    14.0
feb    14.3
mar     9.0
apr    19.1
dtype: float64

Or reshape by Series.unstack , transpose and use DataFrame.dot , but order of values in output is changed:

s = xf['price'].unstack().T.dot(allocation_vector)
print (s)
month
apr    19.1
feb    14.3
jan    14.0
mar     9.0
dtype: float64

You can achieve what you want using a combination of join and groupby as shown below:

allocation_vector.name = 'pct'
xf = xf.join(allocation_vector, on='good')
xf['dotproduct'] = xf.price * xf.pct

print(xf)

The resulting dataframe is:

             price  pct  dotproduct
good  month
milk  jan       19  0.3         5.7
      feb        8  0.3         2.4
      mar        7  0.3         2.1
      apr       15  0.3         4.5
honey jan        9  0.6         5.4
      feb       10  0.6         6.0
      mar        7  0.6         4.2
      apr       11  0.6         6.6
dates jan        2  0.1         0.2
      feb       14  0.1         1.4
      mar       12  0.1         1.2
      apr        7  0.1         0.7

And then you can get the result you need using:

print(xf.groupby('month')['dotproduct'].sum())

The output is:

month
apr    11.8
feb     9.8
jan    11.3
mar     7.5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM