简体   繁体   English

如何在 pandas 中一次重新分配多个 MultiIndex 列?

[英]How can I reassign multiple MultiIndex columns at once in pandas?

Given two versions of the same dataset, one stacked and the other not.给定同一数据集的两个版本,一个堆叠,另一个不堆叠。

>>> a = pandas_datareader.DataReader(["MSFT", "AAPL"], "yahoo")
>>> a
Attributes   Adj Close                   Close                    High                     Low                    Open                  Volume            
Symbols           MSFT        AAPL        MSFT        AAPL        MSFT        AAPL        MSFT        AAPL        MSFT        AAPL        MSFT        AAPL
Date                                                                                                                                                      
2015-06-01   42.744289  120.306801   47.230000  130.539993   47.770000  131.389999   46.619999  130.050003   47.060001  130.279999  28837300.0  32112800.0
2015-06-02   42.463726  119.772255   46.919998  129.960007   47.349998  130.660004   46.619999  129.320007   46.930000  129.860001  21498300.0  33667600.0
2015-06-03   42.400375  119.919716   46.849998  130.119995   47.740002  130.940002   46.820000  129.899994   47.369999  130.660004  28002200.0  30983500.0
2015-06-04   41.956924  119.219307   46.360001  129.360001   47.160000  130.580002   46.200001  128.910004   46.790001  129.580002  27745500.0  38450100.0
2015-06-05   41.757805  118.564957   46.139999  128.649994   46.520000  129.690002   45.840000  128.360001   46.310001  129.500000  25438100.0  35626800.0
...                ...         ...         ...         ...         ...         ...         ...         ...         ...         ...         ...         ...
2020-05-22  183.509995  318.890015  183.509995  318.890015  184.460007  319.230011  182.539993  315.350006  183.190002  315.769989  20826900.0  20450800.0
2020-05-26  181.570007  316.730011  181.570007  316.730011  186.500000  324.239990  181.100006  316.500000  186.339996  323.500000  36073600.0  31380500.0
2020-05-27  181.809998  318.109985  181.809998  318.109985  181.990005  318.709991  176.600006  313.089996  180.199997  316.140015  39517100.0  28236300.0
2020-05-28  181.399994  318.250000  181.399994  318.250000  184.149994  323.440002  180.380005  315.630005  180.740005  316.769989  33810200.0  33390200.0
2020-05-29  183.250000  317.940002  183.250000  317.940002  184.270004  321.149994  180.410004  316.470001  182.729996  319.250000  42130400.0  38383100.0

>>> b = a.stack()
>>> b
Attributes           Adj Close       Close        High         Low        Open      Volume
Date       Symbols                                                                        
2015-06-01 MSFT      42.744289   47.230000   47.770000   46.619999   47.060001  28837300.0
           AAPL     120.306801  130.539993  131.389999  130.050003  130.279999  32112800.0
2015-06-02 MSFT      42.463726   46.919998   47.349998   46.619999   46.930000  21498300.0
           AAPL     119.772255  129.960007  130.660004  129.320007  129.860001  33667600.0
2015-06-03 MSFT      42.400375   46.849998   47.740002   46.820000   47.369999  28002200.0
...                        ...         ...         ...         ...         ...         ...
2020-05-26 AAPL     316.730011  316.730011  324.239990  316.500000  323.500000  31380500.0
2020-05-27 MSFT     181.809998  181.809998  181.990005  176.600006  180.199997  39492600.0
           AAPL     318.109985  318.109985  318.709991  313.089996  316.140015  28211100.0
2020-05-28 MSFT     181.580002  181.580002  182.470001  180.389999  180.740005   9760951.0
           AAPL     319.850006  319.850006  321.070007  315.630005  316.769989  10119124.0

I'm trying to get a couple of columns from a , transform them and reassign them to the dataset.我正在尝试从a中获取几列,对其进行转换并将它们重新分配给数据集。 This works perfectly with b .这与b完美配合。

>>> b[["Close", "High"]] = b[["Close", "High"]].pct_change().fillna(0)
>>> b
Attributes           Adj Close     Close      High         Low        Open      Volume
Date       Symbols                                                                    
2015-06-01 MSFT      42.744289  0.000000  0.000000   46.619999   47.060001  28837300.0
           AAPL     120.306801  1.763921  1.750471  130.050003  130.279999  32112800.0
2015-06-02 MSFT      42.463726 -0.640570 -0.639623   46.619999   46.930000  21498300.0
           AAPL     119.772255  1.769821  1.759451  129.320007  129.860001  33667600.0
2015-06-03 MSFT      42.400375 -0.639504 -0.634624   46.820000   47.369999  28002200.0
...                        ...       ...       ...         ...         ...         ...
2020-05-26 AAPL     316.730011  0.744396  0.738552  316.500000  323.500000  31380500.0
2020-05-27 MSFT     181.809998 -0.425978 -0.438718  176.600006  180.199997  39492600.0
           AAPL     318.109985  0.749684  0.751250  313.089996  316.140015  28211100.0
2020-05-28 MSFT     181.580002 -0.429191 -0.427473  180.389999  180.740005   9760951.0
           AAPL     319.850006  0.761483  0.759577  315.630005  316.769989  10119124.0

[2516 rows x 6 columns]

But the same doesn't work for a .但同样不适a .

>>> a[["Close", "High"]] = a[["Close", "High"]].pct_change().fillna(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/renatomz/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 2935, in __setitem__
    self._setitem_array(key, value)
  File "/home/renatomz/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 2961, in _setitem_array
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key

This is perfectly possible if I where to do it column by column.如果我在哪里逐列进行,这是完全可能的。 I'm using a for loop as a temporary solution, but it seems inefficient and unclean to me.我使用 for 循环作为临时解决方案,但对我来说似乎效率低下且不干净。

>>> a["Close"] = a["Close"].pct_change().fillna(0)
>>> a
Attributes   Adj Close                 Close                  High                     Low                    Open                  Volume            
Symbols           MSFT        AAPL      MSFT      AAPL        MSFT        AAPL        MSFT        AAPL        MSFT        AAPL        MSFT        AAPL
Date                                                                                                                                                  
2015-06-01   42.744289  120.306801  0.000000  0.000000   47.770000  131.389999   46.619999  130.050003   47.060001  130.279999  28837300.0  32112800.0
2015-06-02   42.463726  119.772255 -0.006564 -0.004443   47.349998  130.660004   46.619999  129.320007   46.930000  129.860001  21498300.0  33667600.0
2015-06-03   42.400375  119.919716 -0.001492  0.001231   47.740002  130.940002   46.820000  129.899994   47.369999  130.660004  28002200.0  30983500.0
2015-06-04   41.956924  119.219307 -0.010459 -0.005841   47.160000  130.580002   46.200001  128.910004   46.790001  129.580002  27745500.0  38450100.0
2015-06-05   41.757805  118.564957 -0.004745 -0.005489   46.520000  129.690002   45.840000  128.360001   46.310001  129.500000  25438100.0  35626800.0
...                ...         ...       ...       ...         ...         ...         ...         ...         ...         ...         ...         ...
2020-05-21  183.429993  316.850006 -0.012011 -0.007455  186.669998  320.890015  183.289993  315.869995  185.399994  318.660004  29119500.0  25672200.0
2020-05-22  183.509995  318.890015  0.000436  0.006438  184.460007  319.230011  182.539993  315.350006  183.190002  315.769989  20826900.0  20450800.0
2020-05-26  181.570007  316.730011 -0.010572 -0.006774  186.500000  324.239990  181.100006  316.500000  186.339996  323.500000  36073600.0  31380500.0
2020-05-27  181.809998  318.109985  0.001322  0.004357  181.990005  318.709991  176.600006  313.089996  180.199997  316.140015  39492600.0  28211100.0
2020-05-28  183.561005  322.510010  0.009631  0.013832  183.820007  323.000000  180.389999  315.630005  180.740005  316.769989  15009134.0  16107365.0

I'm writing this as part of a program that should be agnostic to whether columns are a MultiIndex or not, is there any cleaner/faster way I could do that without looping over the columns?我将其作为程序的一部分编写,该程序应该不知道列是否为MultiIndex ,有没有更清洁/更快的方法可以在不循环列的情况下做到这一点?

You can get a multi-index and adapt with it.您可以获得多索引并适应它。

a[["Close", "High"]].columns

MultiIndex([('Close', 'MSFT'),
            ('Close', 'AAPL'),
            ( 'High', 'MSFT'),
            ( 'High', 'AAPL')],
           names=['Attributes', 'Symbols'])

a[[('Close', 'MSFT'),('Close', 'AAPL'),( 'High', 'MSFT'),( 'High', 'AAPL')]] = a[[('Close', 'MSFT'),('Close', 'AAPL'),( 'High', 'MSFT'),( 'High', 'AAPL')]].pct_change().fillna(0)

Attributes  Adj Close   Close   High    Low Open    Volume
Symbols MSFT    AAPL    MSFT    AAPL    MSFT    AAPL    MSFT    AAPL    MSFT    AAPL    MSFT    AAPL
Date                                                
2015-06-01  42.744289   120.306801  0.000000    0.000000    0.000000    0.000000    46.619999   130.050003  47.060001   130.279999  28837300.0  32112800.0
2015-06-02  42.463726   119.772255  -inf    -inf    -0.008792   -0.005556   46.619999   129.320007  46.930000   129.860001  21498300.0  33667600.0
2015-06-03  42.400375   119.919716  -0.772704   -1.277080   0.008237    0.002143    46.820000   129.899994  47.369999   130.660004  28002200.0  30983500.0
2015-06-04  41.956924   119.219307  6.010459    -5.744469   -0.012149   -0.002749   46.200001   128.910004  46.790001   129.580002  27745500.0  38450100.0
2015-06-05  41.757805   118.564957  -0.546270   -0.060285   -0.013571   -0.006816   45.840000   128.360001  46.310001   129.500000  25438100.0  35626800.0
... ... ... ... ... ... ... ... ... ... ... ... ...
2020-05-22  183.509995  318.890015  -1.036311   -1.863583   -0.011839   -0.005173   182.539993  315.350006  183.190002  315.769989  20826900.0  20450800.0
2020-05-26  181.570007  316.730011  -25.238713  -2.052047   0.011059    0.015694    181.100006  316.500000  186.339996  323.500000  36073600.0  31380500.0
2020-05-27  181.809998  318.109985  -1.125029   -1.643233   -0.024182   -0.017055   176.600006  313.089996  180.199997  316.140015  39517100.0  28236300.0
2020-05-28  181.399994  318.250000  -2.706163   -0.898978   0.011869    0.014841    180.380005  315.630005  180.740005  316.769989  33810200.0  33390200.0
2020-05-29  183.250000  317.940002  -5.522368   -3.213063   0.000652    -0.007080   180.410004  316.470001  182.729996  319.250000  42130400.0  38383100.0
1259 rows × 12 columns

For Multiindexes, it is much safer to use the loc method to get your results.对于多索引,使用 loc 方法获取结果要安全得多。

In the code below, loc focuses on the columns (axis=0 would imply working on the rows), and selects "Close" and "High".在下面的代码中,loc 专注于列(axis=0 意味着对行进行操作),并选择“关闭”和“高”。 You can safely place the replacement values on the other side of the equation, and should not get any errors.您可以安全地将替换值放在等式的另一边,并且不会出现任何错误。

I'd also suggest reading pandas docs on MultiIndexes for more information - I believe it will help you when working with multiIndexes:我还建议阅读有关MultiIndexes的 pandas 文档以获取更多信息 - 我相信它会在使用 multiIndexes 时对您有所帮助:

a.loc(axis=1)[["Close","High"]] = a[["Close","High"]].pct_change().fillna(0)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM