I have the following Pandas DataFrame with 50 columns, it consists of the daily closing spot price and option price of a few selected stocks, 25 stocks in total but I am only showing 3 here. The prices shown here are just an example:
date tsla_spot tsla_options aapl_spot aapl_options msft_spot msft_options
2020-01-01 350 23.02 257.21 3.45 170.32 3.56
2020-01-02 345.64 21.32 260.10 3.79 123.45 43.21
2020-01-03 345.12 20.43 262.12 3.90 123.54 45.32
I want to have the following pandas dataframe, but can't figure out what how... Instead, is it known as pivot?
date stock spot options
2020-01-01 tsla 350 23.02
2020-01-01 aapl 257.21 3.79
2020-01-01 msft 170.32 3.56
2020-01-02 tsla 345.64 21.32
2020-01-02 aapl 260.10 3.79
2020-01-02 msft 123.45 43.21
Thank you in adv!
You can use pd.wide_to_long
, by first reorganizing the columns:
pattern = r"(?P<first>\w+)_(?P<last>\w+)"
repl = lambda m: f"{m.group('last')}_{m.group('first')}"
df.columns = df.columns.str.replace(pattern, repl)
df.columns
Index(['date', 'spot_tsla', 'options_tsla', 'spot_aapl', 'options_aapl',
'spot_msft', 'options_msft'],
dtype='object')
Now, apply the wide_to_long function:
pd.wide_to_long(df,
stubnames=["spot", "options"],
i="date",
j="stock",
sep="_",
suffix=".+")
spot options
date stock
2020-01-01 tsla 350.00 23.02
2020-01-02 tsla 345.64 21.32
2020-01-03 tsla 345.12 20.43
2020-01-01 aapl 257.21 3.45
2020-01-02 aapl 260.10 3.79
2020-01-03 aapl 262.12 3.90
2020-01-01 msft 170.32 3.56
2020-01-02 msft 123.45 43.21
2020-01-03 msft 123.54 45.32
Another option is to use the pivot_longer function from pyjanitor :
import janitor
df.pivot_longer(index="date",
names_to=("stock", ".value"),
names_sep="_")
date stock spot options
0 2020-01-01 tsla 350.00 23.02
1 2020-01-01 aapl 257.21 3.45
2 2020-01-01 msft 170.32 3.56
3 2020-01-02 tsla 345.64 21.32
4 2020-01-02 aapl 260.10 3.79
5 2020-01-02 msft 123.45 43.21
6 2020-01-03 tsla 345.12 20.43
7 2020-01-03 aapl 262.12 3.90
8 2020-01-03 msft 123.54 45.32
The .value
tells the function to make spot
and options
as new column names, and the rest become values in the stock
column.
Convert columns without separator to index, split columns names for MultiIndex
and reshape by DataFrame.stack
with DataFrame.rename_axis
for new columns names:
df = df.set_index('date')
df.columns = df.columns.str.split('_', expand=True)
df = df.stack(0).rename_axis(['date', 'stock']).reset_index()
print (df)
date stock options spot
0 2020-01-01 aapl 3.45 257.21
1 2020-01-01 msft 3.56 170.32
2 2020-01-01 tsla 23.02 350.00
3 2020-01-02 aapl 3.79 260.10
4 2020-01-02 msft 43.21 123.45
5 2020-01-02 tsla 21.32 345.64
6 2020-01-03 aapl 3.90 262.12
7 2020-01-03 msft 45.32 123.54
8 2020-01-03 tsla 20.43 345.12
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.