Given two versions of the same dataset, one stacked and the other not.
>>> a = pandas_datareader.DataReader(["MSFT", "AAPL"], "yahoo")
>>> a
Attributes Adj Close Close High Low Open Volume
Symbols MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL
Date
2015-06-01 42.744289 120.306801 47.230000 130.539993 47.770000 131.389999 46.619999 130.050003 47.060001 130.279999 28837300.0 32112800.0
2015-06-02 42.463726 119.772255 46.919998 129.960007 47.349998 130.660004 46.619999 129.320007 46.930000 129.860001 21498300.0 33667600.0
2015-06-03 42.400375 119.919716 46.849998 130.119995 47.740002 130.940002 46.820000 129.899994 47.369999 130.660004 28002200.0 30983500.0
2015-06-04 41.956924 119.219307 46.360001 129.360001 47.160000 130.580002 46.200001 128.910004 46.790001 129.580002 27745500.0 38450100.0
2015-06-05 41.757805 118.564957 46.139999 128.649994 46.520000 129.690002 45.840000 128.360001 46.310001 129.500000 25438100.0 35626800.0
... ... ... ... ... ... ... ... ... ... ... ... ...
2020-05-22 183.509995 318.890015 183.509995 318.890015 184.460007 319.230011 182.539993 315.350006 183.190002 315.769989 20826900.0 20450800.0
2020-05-26 181.570007 316.730011 181.570007 316.730011 186.500000 324.239990 181.100006 316.500000 186.339996 323.500000 36073600.0 31380500.0
2020-05-27 181.809998 318.109985 181.809998 318.109985 181.990005 318.709991 176.600006 313.089996 180.199997 316.140015 39517100.0 28236300.0
2020-05-28 181.399994 318.250000 181.399994 318.250000 184.149994 323.440002 180.380005 315.630005 180.740005 316.769989 33810200.0 33390200.0
2020-05-29 183.250000 317.940002 183.250000 317.940002 184.270004 321.149994 180.410004 316.470001 182.729996 319.250000 42130400.0 38383100.0
>>> b = a.stack()
>>> b
Attributes Adj Close Close High Low Open Volume
Date Symbols
2015-06-01 MSFT 42.744289 47.230000 47.770000 46.619999 47.060001 28837300.0
AAPL 120.306801 130.539993 131.389999 130.050003 130.279999 32112800.0
2015-06-02 MSFT 42.463726 46.919998 47.349998 46.619999 46.930000 21498300.0
AAPL 119.772255 129.960007 130.660004 129.320007 129.860001 33667600.0
2015-06-03 MSFT 42.400375 46.849998 47.740002 46.820000 47.369999 28002200.0
... ... ... ... ... ... ...
2020-05-26 AAPL 316.730011 316.730011 324.239990 316.500000 323.500000 31380500.0
2020-05-27 MSFT 181.809998 181.809998 181.990005 176.600006 180.199997 39492600.0
AAPL 318.109985 318.109985 318.709991 313.089996 316.140015 28211100.0
2020-05-28 MSFT 181.580002 181.580002 182.470001 180.389999 180.740005 9760951.0
AAPL 319.850006 319.850006 321.070007 315.630005 316.769989 10119124.0
I'm trying to get a couple of columns from a
, transform them and reassign them to the dataset. This works perfectly with b
.
>>> b[["Close", "High"]] = b[["Close", "High"]].pct_change().fillna(0)
>>> b
Attributes Adj Close Close High Low Open Volume
Date Symbols
2015-06-01 MSFT 42.744289 0.000000 0.000000 46.619999 47.060001 28837300.0
AAPL 120.306801 1.763921 1.750471 130.050003 130.279999 32112800.0
2015-06-02 MSFT 42.463726 -0.640570 -0.639623 46.619999 46.930000 21498300.0
AAPL 119.772255 1.769821 1.759451 129.320007 129.860001 33667600.0
2015-06-03 MSFT 42.400375 -0.639504 -0.634624 46.820000 47.369999 28002200.0
... ... ... ... ... ... ...
2020-05-26 AAPL 316.730011 0.744396 0.738552 316.500000 323.500000 31380500.0
2020-05-27 MSFT 181.809998 -0.425978 -0.438718 176.600006 180.199997 39492600.0
AAPL 318.109985 0.749684 0.751250 313.089996 316.140015 28211100.0
2020-05-28 MSFT 181.580002 -0.429191 -0.427473 180.389999 180.740005 9760951.0
AAPL 319.850006 0.761483 0.759577 315.630005 316.769989 10119124.0
[2516 rows x 6 columns]
But the same doesn't work for a
.
>>> a[["Close", "High"]] = a[["Close", "High"]].pct_change().fillna(0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/renatomz/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 2935, in __setitem__
self._setitem_array(key, value)
File "/home/renatomz/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 2961, in _setitem_array
raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
This is perfectly possible if I where to do it column by column. I'm using a for loop as a temporary solution, but it seems inefficient and unclean to me.
>>> a["Close"] = a["Close"].pct_change().fillna(0)
>>> a
Attributes Adj Close Close High Low Open Volume
Symbols MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL
Date
2015-06-01 42.744289 120.306801 0.000000 0.000000 47.770000 131.389999 46.619999 130.050003 47.060001 130.279999 28837300.0 32112800.0
2015-06-02 42.463726 119.772255 -0.006564 -0.004443 47.349998 130.660004 46.619999 129.320007 46.930000 129.860001 21498300.0 33667600.0
2015-06-03 42.400375 119.919716 -0.001492 0.001231 47.740002 130.940002 46.820000 129.899994 47.369999 130.660004 28002200.0 30983500.0
2015-06-04 41.956924 119.219307 -0.010459 -0.005841 47.160000 130.580002 46.200001 128.910004 46.790001 129.580002 27745500.0 38450100.0
2015-06-05 41.757805 118.564957 -0.004745 -0.005489 46.520000 129.690002 45.840000 128.360001 46.310001 129.500000 25438100.0 35626800.0
... ... ... ... ... ... ... ... ... ... ... ... ...
2020-05-21 183.429993 316.850006 -0.012011 -0.007455 186.669998 320.890015 183.289993 315.869995 185.399994 318.660004 29119500.0 25672200.0
2020-05-22 183.509995 318.890015 0.000436 0.006438 184.460007 319.230011 182.539993 315.350006 183.190002 315.769989 20826900.0 20450800.0
2020-05-26 181.570007 316.730011 -0.010572 -0.006774 186.500000 324.239990 181.100006 316.500000 186.339996 323.500000 36073600.0 31380500.0
2020-05-27 181.809998 318.109985 0.001322 0.004357 181.990005 318.709991 176.600006 313.089996 180.199997 316.140015 39492600.0 28211100.0
2020-05-28 183.561005 322.510010 0.009631 0.013832 183.820007 323.000000 180.389999 315.630005 180.740005 316.769989 15009134.0 16107365.0
I'm writing this as part of a program that should be agnostic to whether columns are a MultiIndex
or not, is there any cleaner/faster way I could do that without looping over the columns?
You can get a multi-index and adapt with it.
a[["Close", "High"]].columns
MultiIndex([('Close', 'MSFT'),
('Close', 'AAPL'),
( 'High', 'MSFT'),
( 'High', 'AAPL')],
names=['Attributes', 'Symbols'])
a[[('Close', 'MSFT'),('Close', 'AAPL'),( 'High', 'MSFT'),( 'High', 'AAPL')]] = a[[('Close', 'MSFT'),('Close', 'AAPL'),( 'High', 'MSFT'),( 'High', 'AAPL')]].pct_change().fillna(0)
Attributes Adj Close Close High Low Open Volume
Symbols MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL
Date
2015-06-01 42.744289 120.306801 0.000000 0.000000 0.000000 0.000000 46.619999 130.050003 47.060001 130.279999 28837300.0 32112800.0
2015-06-02 42.463726 119.772255 -inf -inf -0.008792 -0.005556 46.619999 129.320007 46.930000 129.860001 21498300.0 33667600.0
2015-06-03 42.400375 119.919716 -0.772704 -1.277080 0.008237 0.002143 46.820000 129.899994 47.369999 130.660004 28002200.0 30983500.0
2015-06-04 41.956924 119.219307 6.010459 -5.744469 -0.012149 -0.002749 46.200001 128.910004 46.790001 129.580002 27745500.0 38450100.0
2015-06-05 41.757805 118.564957 -0.546270 -0.060285 -0.013571 -0.006816 45.840000 128.360001 46.310001 129.500000 25438100.0 35626800.0
... ... ... ... ... ... ... ... ... ... ... ... ...
2020-05-22 183.509995 318.890015 -1.036311 -1.863583 -0.011839 -0.005173 182.539993 315.350006 183.190002 315.769989 20826900.0 20450800.0
2020-05-26 181.570007 316.730011 -25.238713 -2.052047 0.011059 0.015694 181.100006 316.500000 186.339996 323.500000 36073600.0 31380500.0
2020-05-27 181.809998 318.109985 -1.125029 -1.643233 -0.024182 -0.017055 176.600006 313.089996 180.199997 316.140015 39517100.0 28236300.0
2020-05-28 181.399994 318.250000 -2.706163 -0.898978 0.011869 0.014841 180.380005 315.630005 180.740005 316.769989 33810200.0 33390200.0
2020-05-29 183.250000 317.940002 -5.522368 -3.213063 0.000652 -0.007080 180.410004 316.470001 182.729996 319.250000 42130400.0 38383100.0
1259 rows × 12 columns
For Multiindexes, it is much safer to use the loc method to get your results.
In the code below, loc focuses on the columns (axis=0 would imply working on the rows), and selects "Close" and "High". You can safely place the replacement values on the other side of the equation, and should not get any errors.
I'd also suggest reading pandas docs on MultiIndexes for more information - I believe it will help you when working with multiIndexes:
a.loc(axis=1)[["Close","High"]] = a[["Close","High"]].pct_change().fillna(0)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.