[英]How can I reassign multiple MultiIndex columns at once in pandas?
Given two versions of the same dataset, one stacked and the other not.给定同一数据集的两个版本,一个堆叠,另一个不堆叠。
>>> a = pandas_datareader.DataReader(["MSFT", "AAPL"], "yahoo")
>>> a
Attributes Adj Close Close High Low Open Volume
Symbols MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL
Date
2015-06-01 42.744289 120.306801 47.230000 130.539993 47.770000 131.389999 46.619999 130.050003 47.060001 130.279999 28837300.0 32112800.0
2015-06-02 42.463726 119.772255 46.919998 129.960007 47.349998 130.660004 46.619999 129.320007 46.930000 129.860001 21498300.0 33667600.0
2015-06-03 42.400375 119.919716 46.849998 130.119995 47.740002 130.940002 46.820000 129.899994 47.369999 130.660004 28002200.0 30983500.0
2015-06-04 41.956924 119.219307 46.360001 129.360001 47.160000 130.580002 46.200001 128.910004 46.790001 129.580002 27745500.0 38450100.0
2015-06-05 41.757805 118.564957 46.139999 128.649994 46.520000 129.690002 45.840000 128.360001 46.310001 129.500000 25438100.0 35626800.0
... ... ... ... ... ... ... ... ... ... ... ... ...
2020-05-22 183.509995 318.890015 183.509995 318.890015 184.460007 319.230011 182.539993 315.350006 183.190002 315.769989 20826900.0 20450800.0
2020-05-26 181.570007 316.730011 181.570007 316.730011 186.500000 324.239990 181.100006 316.500000 186.339996 323.500000 36073600.0 31380500.0
2020-05-27 181.809998 318.109985 181.809998 318.109985 181.990005 318.709991 176.600006 313.089996 180.199997 316.140015 39517100.0 28236300.0
2020-05-28 181.399994 318.250000 181.399994 318.250000 184.149994 323.440002 180.380005 315.630005 180.740005 316.769989 33810200.0 33390200.0
2020-05-29 183.250000 317.940002 183.250000 317.940002 184.270004 321.149994 180.410004 316.470001 182.729996 319.250000 42130400.0 38383100.0
>>> b = a.stack()
>>> b
Attributes Adj Close Close High Low Open Volume
Date Symbols
2015-06-01 MSFT 42.744289 47.230000 47.770000 46.619999 47.060001 28837300.0
AAPL 120.306801 130.539993 131.389999 130.050003 130.279999 32112800.0
2015-06-02 MSFT 42.463726 46.919998 47.349998 46.619999 46.930000 21498300.0
AAPL 119.772255 129.960007 130.660004 129.320007 129.860001 33667600.0
2015-06-03 MSFT 42.400375 46.849998 47.740002 46.820000 47.369999 28002200.0
... ... ... ... ... ... ...
2020-05-26 AAPL 316.730011 316.730011 324.239990 316.500000 323.500000 31380500.0
2020-05-27 MSFT 181.809998 181.809998 181.990005 176.600006 180.199997 39492600.0
AAPL 318.109985 318.109985 318.709991 313.089996 316.140015 28211100.0
2020-05-28 MSFT 181.580002 181.580002 182.470001 180.389999 180.740005 9760951.0
AAPL 319.850006 319.850006 321.070007 315.630005 316.769989 10119124.0
I'm trying to get a couple of columns from a
, transform them and reassign them to the dataset.我正在尝试从a
中获取几列,对其进行转换并将它们重新分配给数据集。 This works perfectly with b
.这与b
完美配合。
>>> b[["Close", "High"]] = b[["Close", "High"]].pct_change().fillna(0)
>>> b
Attributes Adj Close Close High Low Open Volume
Date Symbols
2015-06-01 MSFT 42.744289 0.000000 0.000000 46.619999 47.060001 28837300.0
AAPL 120.306801 1.763921 1.750471 130.050003 130.279999 32112800.0
2015-06-02 MSFT 42.463726 -0.640570 -0.639623 46.619999 46.930000 21498300.0
AAPL 119.772255 1.769821 1.759451 129.320007 129.860001 33667600.0
2015-06-03 MSFT 42.400375 -0.639504 -0.634624 46.820000 47.369999 28002200.0
... ... ... ... ... ... ...
2020-05-26 AAPL 316.730011 0.744396 0.738552 316.500000 323.500000 31380500.0
2020-05-27 MSFT 181.809998 -0.425978 -0.438718 176.600006 180.199997 39492600.0
AAPL 318.109985 0.749684 0.751250 313.089996 316.140015 28211100.0
2020-05-28 MSFT 181.580002 -0.429191 -0.427473 180.389999 180.740005 9760951.0
AAPL 319.850006 0.761483 0.759577 315.630005 316.769989 10119124.0
[2516 rows x 6 columns]
But the same doesn't work for a
.但同样不适a
.
>>> a[["Close", "High"]] = a[["Close", "High"]].pct_change().fillna(0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/renatomz/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 2935, in __setitem__
self._setitem_array(key, value)
File "/home/renatomz/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 2961, in _setitem_array
raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
This is perfectly possible if I where to do it column by column.如果我在哪里逐列进行,这是完全可能的。 I'm using a for loop as a temporary solution, but it seems inefficient and unclean to me.我使用 for 循环作为临时解决方案,但对我来说似乎效率低下且不干净。
>>> a["Close"] = a["Close"].pct_change().fillna(0)
>>> a
Attributes Adj Close Close High Low Open Volume
Symbols MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL
Date
2015-06-01 42.744289 120.306801 0.000000 0.000000 47.770000 131.389999 46.619999 130.050003 47.060001 130.279999 28837300.0 32112800.0
2015-06-02 42.463726 119.772255 -0.006564 -0.004443 47.349998 130.660004 46.619999 129.320007 46.930000 129.860001 21498300.0 33667600.0
2015-06-03 42.400375 119.919716 -0.001492 0.001231 47.740002 130.940002 46.820000 129.899994 47.369999 130.660004 28002200.0 30983500.0
2015-06-04 41.956924 119.219307 -0.010459 -0.005841 47.160000 130.580002 46.200001 128.910004 46.790001 129.580002 27745500.0 38450100.0
2015-06-05 41.757805 118.564957 -0.004745 -0.005489 46.520000 129.690002 45.840000 128.360001 46.310001 129.500000 25438100.0 35626800.0
... ... ... ... ... ... ... ... ... ... ... ... ...
2020-05-21 183.429993 316.850006 -0.012011 -0.007455 186.669998 320.890015 183.289993 315.869995 185.399994 318.660004 29119500.0 25672200.0
2020-05-22 183.509995 318.890015 0.000436 0.006438 184.460007 319.230011 182.539993 315.350006 183.190002 315.769989 20826900.0 20450800.0
2020-05-26 181.570007 316.730011 -0.010572 -0.006774 186.500000 324.239990 181.100006 316.500000 186.339996 323.500000 36073600.0 31380500.0
2020-05-27 181.809998 318.109985 0.001322 0.004357 181.990005 318.709991 176.600006 313.089996 180.199997 316.140015 39492600.0 28211100.0
2020-05-28 183.561005 322.510010 0.009631 0.013832 183.820007 323.000000 180.389999 315.630005 180.740005 316.769989 15009134.0 16107365.0
I'm writing this as part of a program that should be agnostic to whether columns are a MultiIndex
or not, is there any cleaner/faster way I could do that without looping over the columns?我将其作为程序的一部分编写,该程序应该不知道列是否为MultiIndex
,有没有更清洁/更快的方法可以在不循环列的情况下做到这一点?
You can get a multi-index and adapt with it.您可以获得多索引并适应它。
a[["Close", "High"]].columns
MultiIndex([('Close', 'MSFT'),
('Close', 'AAPL'),
( 'High', 'MSFT'),
( 'High', 'AAPL')],
names=['Attributes', 'Symbols'])
a[[('Close', 'MSFT'),('Close', 'AAPL'),( 'High', 'MSFT'),( 'High', 'AAPL')]] = a[[('Close', 'MSFT'),('Close', 'AAPL'),( 'High', 'MSFT'),( 'High', 'AAPL')]].pct_change().fillna(0)
Attributes Adj Close Close High Low Open Volume
Symbols MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL
Date
2015-06-01 42.744289 120.306801 0.000000 0.000000 0.000000 0.000000 46.619999 130.050003 47.060001 130.279999 28837300.0 32112800.0
2015-06-02 42.463726 119.772255 -inf -inf -0.008792 -0.005556 46.619999 129.320007 46.930000 129.860001 21498300.0 33667600.0
2015-06-03 42.400375 119.919716 -0.772704 -1.277080 0.008237 0.002143 46.820000 129.899994 47.369999 130.660004 28002200.0 30983500.0
2015-06-04 41.956924 119.219307 6.010459 -5.744469 -0.012149 -0.002749 46.200001 128.910004 46.790001 129.580002 27745500.0 38450100.0
2015-06-05 41.757805 118.564957 -0.546270 -0.060285 -0.013571 -0.006816 45.840000 128.360001 46.310001 129.500000 25438100.0 35626800.0
... ... ... ... ... ... ... ... ... ... ... ... ...
2020-05-22 183.509995 318.890015 -1.036311 -1.863583 -0.011839 -0.005173 182.539993 315.350006 183.190002 315.769989 20826900.0 20450800.0
2020-05-26 181.570007 316.730011 -25.238713 -2.052047 0.011059 0.015694 181.100006 316.500000 186.339996 323.500000 36073600.0 31380500.0
2020-05-27 181.809998 318.109985 -1.125029 -1.643233 -0.024182 -0.017055 176.600006 313.089996 180.199997 316.140015 39517100.0 28236300.0
2020-05-28 181.399994 318.250000 -2.706163 -0.898978 0.011869 0.014841 180.380005 315.630005 180.740005 316.769989 33810200.0 33390200.0
2020-05-29 183.250000 317.940002 -5.522368 -3.213063 0.000652 -0.007080 180.410004 316.470001 182.729996 319.250000 42130400.0 38383100.0
1259 rows × 12 columns
For Multiindexes, it is much safer to use the loc method to get your results.对于多索引,使用 loc 方法获取结果要安全得多。
In the code below, loc focuses on the columns (axis=0 would imply working on the rows), and selects "Close" and "High".在下面的代码中,loc 专注于列(axis=0 意味着对行进行操作),并选择“关闭”和“高”。 You can safely place the replacement values on the other side of the equation, and should not get any errors.您可以安全地将替换值放在等式的另一边,并且不会出现任何错误。
I'd also suggest reading pandas docs on MultiIndexes for more information - I believe it will help you when working with multiIndexes:我还建议阅读有关MultiIndexes的 pandas 文档以获取更多信息 - 我相信它会在使用 multiIndexes 时对您有所帮助:
a.loc(axis=1)[["Close","High"]] = a[["Close","High"]].pct_change().fillna(0)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.