简体   繁体   中英

How can I reassign multiple MultiIndex columns at once in pandas?

Given two versions of the same dataset, one stacked and the other not.

>>> a = pandas_datareader.DataReader(["MSFT", "AAPL"], "yahoo")
>>> a
Attributes   Adj Close                   Close                    High                     Low                    Open                  Volume            
Symbols           MSFT        AAPL        MSFT        AAPL        MSFT        AAPL        MSFT        AAPL        MSFT        AAPL        MSFT        AAPL
Date                                                                                                                                                      
2015-06-01   42.744289  120.306801   47.230000  130.539993   47.770000  131.389999   46.619999  130.050003   47.060001  130.279999  28837300.0  32112800.0
2015-06-02   42.463726  119.772255   46.919998  129.960007   47.349998  130.660004   46.619999  129.320007   46.930000  129.860001  21498300.0  33667600.0
2015-06-03   42.400375  119.919716   46.849998  130.119995   47.740002  130.940002   46.820000  129.899994   47.369999  130.660004  28002200.0  30983500.0
2015-06-04   41.956924  119.219307   46.360001  129.360001   47.160000  130.580002   46.200001  128.910004   46.790001  129.580002  27745500.0  38450100.0
2015-06-05   41.757805  118.564957   46.139999  128.649994   46.520000  129.690002   45.840000  128.360001   46.310001  129.500000  25438100.0  35626800.0
...                ...         ...         ...         ...         ...         ...         ...         ...         ...         ...         ...         ...
2020-05-22  183.509995  318.890015  183.509995  318.890015  184.460007  319.230011  182.539993  315.350006  183.190002  315.769989  20826900.0  20450800.0
2020-05-26  181.570007  316.730011  181.570007  316.730011  186.500000  324.239990  181.100006  316.500000  186.339996  323.500000  36073600.0  31380500.0
2020-05-27  181.809998  318.109985  181.809998  318.109985  181.990005  318.709991  176.600006  313.089996  180.199997  316.140015  39517100.0  28236300.0
2020-05-28  181.399994  318.250000  181.399994  318.250000  184.149994  323.440002  180.380005  315.630005  180.740005  316.769989  33810200.0  33390200.0
2020-05-29  183.250000  317.940002  183.250000  317.940002  184.270004  321.149994  180.410004  316.470001  182.729996  319.250000  42130400.0  38383100.0

>>> b = a.stack()
>>> b
Attributes           Adj Close       Close        High         Low        Open      Volume
Date       Symbols                                                                        
2015-06-01 MSFT      42.744289   47.230000   47.770000   46.619999   47.060001  28837300.0
           AAPL     120.306801  130.539993  131.389999  130.050003  130.279999  32112800.0
2015-06-02 MSFT      42.463726   46.919998   47.349998   46.619999   46.930000  21498300.0
           AAPL     119.772255  129.960007  130.660004  129.320007  129.860001  33667600.0
2015-06-03 MSFT      42.400375   46.849998   47.740002   46.820000   47.369999  28002200.0
...                        ...         ...         ...         ...         ...         ...
2020-05-26 AAPL     316.730011  316.730011  324.239990  316.500000  323.500000  31380500.0
2020-05-27 MSFT     181.809998  181.809998  181.990005  176.600006  180.199997  39492600.0
           AAPL     318.109985  318.109985  318.709991  313.089996  316.140015  28211100.0
2020-05-28 MSFT     181.580002  181.580002  182.470001  180.389999  180.740005   9760951.0
           AAPL     319.850006  319.850006  321.070007  315.630005  316.769989  10119124.0

I'm trying to get a couple of columns from a , transform them and reassign them to the dataset. This works perfectly with b .

>>> b[["Close", "High"]] = b[["Close", "High"]].pct_change().fillna(0)
>>> b
Attributes           Adj Close     Close      High         Low        Open      Volume
Date       Symbols                                                                    
2015-06-01 MSFT      42.744289  0.000000  0.000000   46.619999   47.060001  28837300.0
           AAPL     120.306801  1.763921  1.750471  130.050003  130.279999  32112800.0
2015-06-02 MSFT      42.463726 -0.640570 -0.639623   46.619999   46.930000  21498300.0
           AAPL     119.772255  1.769821  1.759451  129.320007  129.860001  33667600.0
2015-06-03 MSFT      42.400375 -0.639504 -0.634624   46.820000   47.369999  28002200.0
...                        ...       ...       ...         ...         ...         ...
2020-05-26 AAPL     316.730011  0.744396  0.738552  316.500000  323.500000  31380500.0
2020-05-27 MSFT     181.809998 -0.425978 -0.438718  176.600006  180.199997  39492600.0
           AAPL     318.109985  0.749684  0.751250  313.089996  316.140015  28211100.0
2020-05-28 MSFT     181.580002 -0.429191 -0.427473  180.389999  180.740005   9760951.0
           AAPL     319.850006  0.761483  0.759577  315.630005  316.769989  10119124.0

[2516 rows x 6 columns]

But the same doesn't work for a .

>>> a[["Close", "High"]] = a[["Close", "High"]].pct_change().fillna(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/renatomz/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 2935, in __setitem__
    self._setitem_array(key, value)
  File "/home/renatomz/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 2961, in _setitem_array
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key

This is perfectly possible if I where to do it column by column. I'm using a for loop as a temporary solution, but it seems inefficient and unclean to me.

>>> a["Close"] = a["Close"].pct_change().fillna(0)
>>> a
Attributes   Adj Close                 Close                  High                     Low                    Open                  Volume            
Symbols           MSFT        AAPL      MSFT      AAPL        MSFT        AAPL        MSFT        AAPL        MSFT        AAPL        MSFT        AAPL
Date                                                                                                                                                  
2015-06-01   42.744289  120.306801  0.000000  0.000000   47.770000  131.389999   46.619999  130.050003   47.060001  130.279999  28837300.0  32112800.0
2015-06-02   42.463726  119.772255 -0.006564 -0.004443   47.349998  130.660004   46.619999  129.320007   46.930000  129.860001  21498300.0  33667600.0
2015-06-03   42.400375  119.919716 -0.001492  0.001231   47.740002  130.940002   46.820000  129.899994   47.369999  130.660004  28002200.0  30983500.0
2015-06-04   41.956924  119.219307 -0.010459 -0.005841   47.160000  130.580002   46.200001  128.910004   46.790001  129.580002  27745500.0  38450100.0
2015-06-05   41.757805  118.564957 -0.004745 -0.005489   46.520000  129.690002   45.840000  128.360001   46.310001  129.500000  25438100.0  35626800.0
...                ...         ...       ...       ...         ...         ...         ...         ...         ...         ...         ...         ...
2020-05-21  183.429993  316.850006 -0.012011 -0.007455  186.669998  320.890015  183.289993  315.869995  185.399994  318.660004  29119500.0  25672200.0
2020-05-22  183.509995  318.890015  0.000436  0.006438  184.460007  319.230011  182.539993  315.350006  183.190002  315.769989  20826900.0  20450800.0
2020-05-26  181.570007  316.730011 -0.010572 -0.006774  186.500000  324.239990  181.100006  316.500000  186.339996  323.500000  36073600.0  31380500.0
2020-05-27  181.809998  318.109985  0.001322  0.004357  181.990005  318.709991  176.600006  313.089996  180.199997  316.140015  39492600.0  28211100.0
2020-05-28  183.561005  322.510010  0.009631  0.013832  183.820007  323.000000  180.389999  315.630005  180.740005  316.769989  15009134.0  16107365.0

I'm writing this as part of a program that should be agnostic to whether columns are a MultiIndex or not, is there any cleaner/faster way I could do that without looping over the columns?

You can get a multi-index and adapt with it.

a[["Close", "High"]].columns

MultiIndex([('Close', 'MSFT'),
            ('Close', 'AAPL'),
            ( 'High', 'MSFT'),
            ( 'High', 'AAPL')],
           names=['Attributes', 'Symbols'])

a[[('Close', 'MSFT'),('Close', 'AAPL'),( 'High', 'MSFT'),( 'High', 'AAPL')]] = a[[('Close', 'MSFT'),('Close', 'AAPL'),( 'High', 'MSFT'),( 'High', 'AAPL')]].pct_change().fillna(0)

Attributes  Adj Close   Close   High    Low Open    Volume
Symbols MSFT    AAPL    MSFT    AAPL    MSFT    AAPL    MSFT    AAPL    MSFT    AAPL    MSFT    AAPL
Date                                                
2015-06-01  42.744289   120.306801  0.000000    0.000000    0.000000    0.000000    46.619999   130.050003  47.060001   130.279999  28837300.0  32112800.0
2015-06-02  42.463726   119.772255  -inf    -inf    -0.008792   -0.005556   46.619999   129.320007  46.930000   129.860001  21498300.0  33667600.0
2015-06-03  42.400375   119.919716  -0.772704   -1.277080   0.008237    0.002143    46.820000   129.899994  47.369999   130.660004  28002200.0  30983500.0
2015-06-04  41.956924   119.219307  6.010459    -5.744469   -0.012149   -0.002749   46.200001   128.910004  46.790001   129.580002  27745500.0  38450100.0
2015-06-05  41.757805   118.564957  -0.546270   -0.060285   -0.013571   -0.006816   45.840000   128.360001  46.310001   129.500000  25438100.0  35626800.0
... ... ... ... ... ... ... ... ... ... ... ... ...
2020-05-22  183.509995  318.890015  -1.036311   -1.863583   -0.011839   -0.005173   182.539993  315.350006  183.190002  315.769989  20826900.0  20450800.0
2020-05-26  181.570007  316.730011  -25.238713  -2.052047   0.011059    0.015694    181.100006  316.500000  186.339996  323.500000  36073600.0  31380500.0
2020-05-27  181.809998  318.109985  -1.125029   -1.643233   -0.024182   -0.017055   176.600006  313.089996  180.199997  316.140015  39517100.0  28236300.0
2020-05-28  181.399994  318.250000  -2.706163   -0.898978   0.011869    0.014841    180.380005  315.630005  180.740005  316.769989  33810200.0  33390200.0
2020-05-29  183.250000  317.940002  -5.522368   -3.213063   0.000652    -0.007080   180.410004  316.470001  182.729996  319.250000  42130400.0  38383100.0
1259 rows × 12 columns

For Multiindexes, it is much safer to use the loc method to get your results.

In the code below, loc focuses on the columns (axis=0 would imply working on the rows), and selects "Close" and "High". You can safely place the replacement values on the other side of the equation, and should not get any errors.

I'd also suggest reading pandas docs on MultiIndexes for more information - I believe it will help you when working with multiIndexes:

a.loc(axis=1)[["Close","High"]] = a[["Close","High"]].pct_change().fillna(0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM