[英]How can I reassign multiple MultiIndex columns at once in pandas?
給定同一數據集的兩個版本,一個堆疊,另一個不堆疊。
>>> a = pandas_datareader.DataReader(["MSFT", "AAPL"], "yahoo")
>>> a
Attributes Adj Close Close High Low Open Volume
Symbols MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL
Date
2015-06-01 42.744289 120.306801 47.230000 130.539993 47.770000 131.389999 46.619999 130.050003 47.060001 130.279999 28837300.0 32112800.0
2015-06-02 42.463726 119.772255 46.919998 129.960007 47.349998 130.660004 46.619999 129.320007 46.930000 129.860001 21498300.0 33667600.0
2015-06-03 42.400375 119.919716 46.849998 130.119995 47.740002 130.940002 46.820000 129.899994 47.369999 130.660004 28002200.0 30983500.0
2015-06-04 41.956924 119.219307 46.360001 129.360001 47.160000 130.580002 46.200001 128.910004 46.790001 129.580002 27745500.0 38450100.0
2015-06-05 41.757805 118.564957 46.139999 128.649994 46.520000 129.690002 45.840000 128.360001 46.310001 129.500000 25438100.0 35626800.0
... ... ... ... ... ... ... ... ... ... ... ... ...
2020-05-22 183.509995 318.890015 183.509995 318.890015 184.460007 319.230011 182.539993 315.350006 183.190002 315.769989 20826900.0 20450800.0
2020-05-26 181.570007 316.730011 181.570007 316.730011 186.500000 324.239990 181.100006 316.500000 186.339996 323.500000 36073600.0 31380500.0
2020-05-27 181.809998 318.109985 181.809998 318.109985 181.990005 318.709991 176.600006 313.089996 180.199997 316.140015 39517100.0 28236300.0
2020-05-28 181.399994 318.250000 181.399994 318.250000 184.149994 323.440002 180.380005 315.630005 180.740005 316.769989 33810200.0 33390200.0
2020-05-29 183.250000 317.940002 183.250000 317.940002 184.270004 321.149994 180.410004 316.470001 182.729996 319.250000 42130400.0 38383100.0
>>> b = a.stack()
>>> b
Attributes Adj Close Close High Low Open Volume
Date Symbols
2015-06-01 MSFT 42.744289 47.230000 47.770000 46.619999 47.060001 28837300.0
AAPL 120.306801 130.539993 131.389999 130.050003 130.279999 32112800.0
2015-06-02 MSFT 42.463726 46.919998 47.349998 46.619999 46.930000 21498300.0
AAPL 119.772255 129.960007 130.660004 129.320007 129.860001 33667600.0
2015-06-03 MSFT 42.400375 46.849998 47.740002 46.820000 47.369999 28002200.0
... ... ... ... ... ... ...
2020-05-26 AAPL 316.730011 316.730011 324.239990 316.500000 323.500000 31380500.0
2020-05-27 MSFT 181.809998 181.809998 181.990005 176.600006 180.199997 39492600.0
AAPL 318.109985 318.109985 318.709991 313.089996 316.140015 28211100.0
2020-05-28 MSFT 181.580002 181.580002 182.470001 180.389999 180.740005 9760951.0
AAPL 319.850006 319.850006 321.070007 315.630005 316.769989 10119124.0
我正在嘗試從a
中獲取幾列,對其進行轉換並將它們重新分配給數據集。 這與b
完美配合。
>>> b[["Close", "High"]] = b[["Close", "High"]].pct_change().fillna(0)
>>> b
Attributes Adj Close Close High Low Open Volume
Date Symbols
2015-06-01 MSFT 42.744289 0.000000 0.000000 46.619999 47.060001 28837300.0
AAPL 120.306801 1.763921 1.750471 130.050003 130.279999 32112800.0
2015-06-02 MSFT 42.463726 -0.640570 -0.639623 46.619999 46.930000 21498300.0
AAPL 119.772255 1.769821 1.759451 129.320007 129.860001 33667600.0
2015-06-03 MSFT 42.400375 -0.639504 -0.634624 46.820000 47.369999 28002200.0
... ... ... ... ... ... ...
2020-05-26 AAPL 316.730011 0.744396 0.738552 316.500000 323.500000 31380500.0
2020-05-27 MSFT 181.809998 -0.425978 -0.438718 176.600006 180.199997 39492600.0
AAPL 318.109985 0.749684 0.751250 313.089996 316.140015 28211100.0
2020-05-28 MSFT 181.580002 -0.429191 -0.427473 180.389999 180.740005 9760951.0
AAPL 319.850006 0.761483 0.759577 315.630005 316.769989 10119124.0
[2516 rows x 6 columns]
但同樣不適a
.
>>> a[["Close", "High"]] = a[["Close", "High"]].pct_change().fillna(0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/renatomz/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 2935, in __setitem__
self._setitem_array(key, value)
File "/home/renatomz/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 2961, in _setitem_array
raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
如果我在哪里逐列進行,這是完全可能的。 我使用 for 循環作為臨時解決方案,但對我來說似乎效率低下且不干凈。
>>> a["Close"] = a["Close"].pct_change().fillna(0)
>>> a
Attributes Adj Close Close High Low Open Volume
Symbols MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL
Date
2015-06-01 42.744289 120.306801 0.000000 0.000000 47.770000 131.389999 46.619999 130.050003 47.060001 130.279999 28837300.0 32112800.0
2015-06-02 42.463726 119.772255 -0.006564 -0.004443 47.349998 130.660004 46.619999 129.320007 46.930000 129.860001 21498300.0 33667600.0
2015-06-03 42.400375 119.919716 -0.001492 0.001231 47.740002 130.940002 46.820000 129.899994 47.369999 130.660004 28002200.0 30983500.0
2015-06-04 41.956924 119.219307 -0.010459 -0.005841 47.160000 130.580002 46.200001 128.910004 46.790001 129.580002 27745500.0 38450100.0
2015-06-05 41.757805 118.564957 -0.004745 -0.005489 46.520000 129.690002 45.840000 128.360001 46.310001 129.500000 25438100.0 35626800.0
... ... ... ... ... ... ... ... ... ... ... ... ...
2020-05-21 183.429993 316.850006 -0.012011 -0.007455 186.669998 320.890015 183.289993 315.869995 185.399994 318.660004 29119500.0 25672200.0
2020-05-22 183.509995 318.890015 0.000436 0.006438 184.460007 319.230011 182.539993 315.350006 183.190002 315.769989 20826900.0 20450800.0
2020-05-26 181.570007 316.730011 -0.010572 -0.006774 186.500000 324.239990 181.100006 316.500000 186.339996 323.500000 36073600.0 31380500.0
2020-05-27 181.809998 318.109985 0.001322 0.004357 181.990005 318.709991 176.600006 313.089996 180.199997 316.140015 39492600.0 28211100.0
2020-05-28 183.561005 322.510010 0.009631 0.013832 183.820007 323.000000 180.389999 315.630005 180.740005 316.769989 15009134.0 16107365.0
我將其作為程序的一部分編寫,該程序應該不知道列是否為MultiIndex
,有沒有更清潔/更快的方法可以在不循環列的情況下做到這一點?
您可以獲得多索引並適應它。
a[["Close", "High"]].columns
MultiIndex([('Close', 'MSFT'),
('Close', 'AAPL'),
( 'High', 'MSFT'),
( 'High', 'AAPL')],
names=['Attributes', 'Symbols'])
a[[('Close', 'MSFT'),('Close', 'AAPL'),( 'High', 'MSFT'),( 'High', 'AAPL')]] = a[[('Close', 'MSFT'),('Close', 'AAPL'),( 'High', 'MSFT'),( 'High', 'AAPL')]].pct_change().fillna(0)
Attributes Adj Close Close High Low Open Volume
Symbols MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL
Date
2015-06-01 42.744289 120.306801 0.000000 0.000000 0.000000 0.000000 46.619999 130.050003 47.060001 130.279999 28837300.0 32112800.0
2015-06-02 42.463726 119.772255 -inf -inf -0.008792 -0.005556 46.619999 129.320007 46.930000 129.860001 21498300.0 33667600.0
2015-06-03 42.400375 119.919716 -0.772704 -1.277080 0.008237 0.002143 46.820000 129.899994 47.369999 130.660004 28002200.0 30983500.0
2015-06-04 41.956924 119.219307 6.010459 -5.744469 -0.012149 -0.002749 46.200001 128.910004 46.790001 129.580002 27745500.0 38450100.0
2015-06-05 41.757805 118.564957 -0.546270 -0.060285 -0.013571 -0.006816 45.840000 128.360001 46.310001 129.500000 25438100.0 35626800.0
... ... ... ... ... ... ... ... ... ... ... ... ...
2020-05-22 183.509995 318.890015 -1.036311 -1.863583 -0.011839 -0.005173 182.539993 315.350006 183.190002 315.769989 20826900.0 20450800.0
2020-05-26 181.570007 316.730011 -25.238713 -2.052047 0.011059 0.015694 181.100006 316.500000 186.339996 323.500000 36073600.0 31380500.0
2020-05-27 181.809998 318.109985 -1.125029 -1.643233 -0.024182 -0.017055 176.600006 313.089996 180.199997 316.140015 39517100.0 28236300.0
2020-05-28 181.399994 318.250000 -2.706163 -0.898978 0.011869 0.014841 180.380005 315.630005 180.740005 316.769989 33810200.0 33390200.0
2020-05-29 183.250000 317.940002 -5.522368 -3.213063 0.000652 -0.007080 180.410004 316.470001 182.729996 319.250000 42130400.0 38383100.0
1259 rows × 12 columns
對於多索引,使用 loc 方法獲取結果要安全得多。
在下面的代碼中,loc 專注於列(axis=0 意味着對行進行操作),並選擇“關閉”和“高”。 您可以安全地將替換值放在等式的另一邊,並且不會出現任何錯誤。
我還建議閱讀有關MultiIndexes的 pandas 文檔以獲取更多信息 - 我相信它會在使用 multiIndexes 時對您有所幫助:
a.loc(axis=1)[["Close","High"]] = a[["Close","High"]].pct_change().fillna(0)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.