简体   繁体   中英

Pivoting Pandas DataFrame

I have the following Pandas DataFrame with 50 columns, it consists of the daily closing spot price and option price of a few selected stocks, 25 stocks in total but I am only showing 3 here. The prices shown here are just an example:

date           tsla_spot   tsla_options aapl_spot  aapl_options msft_spot  msft_options 
2020-01-01       350            23.02      257.21      3.45        170.32      3.56
2020-01-02       345.64         21.32      260.10      3.79        123.45      43.21
2020-01-03       345.12         20.43      262.12      3.90        123.54      45.32

I want to have the following pandas dataframe, but can't figure out what how... Instead, is it known as pivot?

date          stock    spot    options
2020-01-01     tsla     350      23.02
2020-01-01     aapl     257.21   3.79
2020-01-01     msft     170.32   3.56
2020-01-02     tsla     345.64   21.32
2020-01-02     aapl     260.10   3.79
2020-01-02     msft     123.45    43.21

Thank you in adv!

You can use pd.wide_to_long , by first reorganizing the columns:

pattern = r"(?P<first>\w+)_(?P<last>\w+)"
repl = lambda m: f"{m.group('last')}_{m.group('first')}"
df.columns = df.columns.str.replace(pattern, repl)

df.columns
Index(['date', 'spot_tsla', 'options_tsla', 'spot_aapl', 'options_aapl',
   'spot_msft', 'options_msft'],
  dtype='object')

Now, apply the wide_to_long function:

pd.wide_to_long(df, 
                stubnames=["spot", "options"], 
                i="date", 
                j="stock", 
                sep="_", 
                suffix=".+")

                     spot   options
date       stock        
2020-01-01  tsla    350.00  23.02
2020-01-02  tsla    345.64  21.32
2020-01-03  tsla    345.12  20.43
2020-01-01  aapl    257.21  3.45
2020-01-02  aapl    260.10  3.79
2020-01-03  aapl    262.12  3.90
2020-01-01  msft    170.32  3.56
2020-01-02  msft    123.45  43.21
2020-01-03  msft    123.54  45.32

Another option is to use the pivot_longer function from pyjanitor :

import janitor

df.pivot_longer(index="date", 
                names_to=("stock", ".value"), 
                names_sep="_")


    date    stock        spot   options
0   2020-01-01  tsla    350.00  23.02
1   2020-01-01  aapl    257.21  3.45
2   2020-01-01  msft    170.32  3.56
3   2020-01-02  tsla    345.64  21.32
4   2020-01-02  aapl    260.10  3.79
5   2020-01-02  msft    123.45  43.21
6   2020-01-03  tsla    345.12  20.43
7   2020-01-03  aapl    262.12  3.90
8   2020-01-03  msft    123.54  45.32

The .value tells the function to make spot and options as new column names, and the rest become values in the stock column.

Convert columns without separator to index, split columns names for MultiIndex and reshape by DataFrame.stack with DataFrame.rename_axis for new columns names:

df = df.set_index('date')
df.columns = df.columns.str.split('_', expand=True)
df = df.stack(0).rename_axis(['date', 'stock']).reset_index()
print (df)
         date stock  options    spot
0  2020-01-01  aapl     3.45  257.21
1  2020-01-01  msft     3.56  170.32
2  2020-01-01  tsla    23.02  350.00
3  2020-01-02  aapl     3.79  260.10
4  2020-01-02  msft    43.21  123.45
5  2020-01-02  tsla    21.32  345.64
6  2020-01-03  aapl     3.90  262.12
7  2020-01-03  msft    45.32  123.54
8  2020-01-03  tsla    20.43  345.12

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM