简体   繁体   中英

Multiply all columns of a multi-indexed DataFrame by appropriate values in a Series

I feel like this one should be obvious, but I'm a bit stuck.

I have a DataFrame ( df ) with a 3-level MultiIndex on the rows. One of the levels of the MultiIndex is ccy and represents the currency that denominates the information contained in that row. Each row has 3 columns of data.

I would like to convert all of the data to be denominated in a reference currency (say USD). To do this, I have a series ( forex ) that contains foreign exchange rates for the relevant currencies.

So the goal is simple: multiply all the data in each row of df by the value of forex that corresponds to the ccy entry of the index of that row in df .

The mechanical setup looks like this:

import pandas as pd
import numpy as np
import itertools

np.random.seed(0)

tuples = list(itertools.product(
                                list('abd'), 
                                ['one', 'two', 'three'], 
                                ['USD', 'EUR', 'GBP']
                                ))

np.random.shuffle(tuples)

idx = pd.MultiIndex.from_tuples(tuples[:-10], names=['letter', 'number', 'ccy'])

df = pd.DataFrame(np.random.randn(len(idx), 3), index=idx,
                  columns=['val_1', 'val_2', 'val_3'])

forex = pd.Series({'USD': 1.0,
                   'EUR': 1.3,
                   'GBP': 1.7})

I can get what I need by running:

df.apply(lambda col: col.mul(forex, level='ccy'), axis=0)

But it seems weird to me that I would need to use pd.DataFrame.apply in such a simple case. I would have expected the following syntax (or something very much like it) to work:

df.mul(forex, level='ccy', axis=0)

but that gives me:

ValueError: cannot reindex from a duplicate axis

Clearly the apply method isn't a disaster. But just seems weird that I couldn't figure out the syntax for doing this directly across all the columns with mul . Is there a more direct way to handle this? If not, is there an intuitive reason the mul syntax shouldn't be enhanced to work this way?

This now works in master/0.14. See the issue: https://github.com/pydata/pandas/pull/6682

In [11]: df.mul(forex,level='ccy',axis=0)
Out[11]: 
                      val_1     val_2     val_3
letter number ccy                              
a      one    GBP -2.172854  2.443530 -0.132098
d      three  USD  1.089630  0.096543  1.418667
b      two    GBP  1.986064  1.610216  1.845328
       three  GBP  4.049782 -0.690240  0.452957
a      two    GBP -2.304713 -0.193974 -1.435192
b      one    GBP  1.199589 -0.677936 -1.406234
d      two    GBP -0.706766 -0.891671  1.382272
b      two    EUR -0.298026  2.810233 -1.244011
d      one    EUR  0.087504  0.268448 -0.593946
              GBP -1.801959  1.045427  2.430423
b      three  EUR -0.275538 -0.104438  0.527017
a      one    EUR  0.154189  1.630738  1.844833
b      one    EUR -0.967013 -3.272668 -1.959225
d      three  GBP  1.953429 -2.029083  1.939772
              EUR  1.962279  1.388108 -0.892566
a      three  GBP  0.025285 -0.638632 -0.064980
              USD  0.367974 -0.044724 -0.302375

[17 rows x 3 columns]

Here is a another way to do it (also requires master/0.14)

In [127]: df = df.sortlevel()

In [128]: df
Out[128]: 
                      val_1     val_2     val_3
letter number ccy                              
a      one    EUR  0.118607  1.254414  1.419102
              GBP -1.278149  1.437371 -0.077705
       three  GBP  0.014873 -0.375666 -0.038224
              USD  0.367974 -0.044724 -0.302375
       two    GBP -1.355714 -0.114103 -0.844231
b      one    EUR -0.743856 -2.517437 -1.507096
              GBP  0.705641 -0.398786 -0.827197
       three  EUR -0.211952 -0.080337  0.405398
              GBP  2.382224 -0.406024  0.266445
       two    EUR -0.229251  2.161717 -0.956931
              GBP  1.168273  0.947186  1.085487
d      one    EUR  0.067311  0.206499 -0.456881
              GBP -1.059976  0.614957  1.429661
       three  EUR  1.509445  1.067775 -0.686589
              GBP  1.149076 -1.193578  1.141042
              USD  1.089630  0.096543  1.418667
       two    GBP -0.415745 -0.524512  0.813101

[17 rows x 3 columns]

idx = pd.IndexSlice

In [129]: pd.concat([ df.loc[idx[:,:,x],:]*v for x,v in forex.iteritems() ])
Out[129]: 
                      val_1     val_2     val_3
letter number ccy                              
a      one    EUR  0.154189  1.630738  1.844833
b      one    EUR -0.967013 -3.272668 -1.959225
       three  EUR -0.275538 -0.104438  0.527017
       two    EUR -0.298026  2.810233 -1.244011
d      one    EUR  0.087504  0.268448 -0.593946
       three  EUR  1.962279  1.388108 -0.892566
a      one    GBP -2.172854  2.443530 -0.132098
       three  GBP  0.025285 -0.638632 -0.064980
       two    GBP -2.304713 -0.193974 -1.435192
b      one    GBP  1.199589 -0.677936 -1.406234
       three  GBP  4.049782 -0.690240  0.452957
       two    GBP  1.986064  1.610216  1.845328
d      one    GBP -1.801959  1.045427  2.430423
       three  GBP  1.953429 -2.029083  1.939772
       two    GBP -0.706766 -0.891671  1.382272
a      three  USD  0.367974 -0.044724 -0.302375
d      three  USD  1.089630  0.096543  1.418667

[17 rows x 3 columns]

Here's another way via merging

In [36]: f = forex.to_frame('value')

In [37]: f.index.name =  'ccy'

In [38]: pd.merge(df.reset_index(),f.reset_index(),on='ccy')
Out[38]: 
   letter number  ccy     val_1     val_2     val_3  value
0       a    one  GBP -1.278149  1.437371 -0.077705    1.7
1       b    two  GBP  1.168273  0.947186  1.085487    1.7
2       b  three  GBP  2.382224 -0.406024  0.266445    1.7
3       a    two  GBP -1.355714 -0.114103 -0.844231    1.7
4       b    one  GBP  0.705641 -0.398786 -0.827197    1.7
5       d    two  GBP -0.415745 -0.524512  0.813101    1.7
6       d    one  GBP -1.059976  0.614957  1.429661    1.7
7       d  three  GBP  1.149076 -1.193578  1.141042    1.7
8       a  three  GBP  0.014873 -0.375666 -0.038224    1.7
9       d  three  USD  1.089630  0.096543  1.418667    1.0
10      a  three  USD  0.367974 -0.044724 -0.302375    1.0
11      b    two  EUR -0.229251  2.161717 -0.956931    1.3
12      d    one  EUR  0.067311  0.206499 -0.456881    1.3
13      b  three  EUR -0.211952 -0.080337  0.405398    1.3
14      a    one  EUR  0.118607  1.254414  1.419102    1.3
15      b    one  EUR -0.743856 -2.517437 -1.507096    1.3
16      d  three  EUR  1.509445  1.067775 -0.686589    1.3

[17 rows x 7 columns]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM