简体   繁体   中英

Pandas: How to groupby a dataframe and convert the rows to columns and consolidate the rows

Here's my data structure:

        date_time             ticker    stock_price     type    bid   ask       impVol               symbol     strike_price  delta  vega  gamma  theta  rho  diff
371     2021-02-19 14:28:45   AMZN      3328.23         put     44.5  46.85     NaN     AMZN210226P03330000     3330.0          NaN   NaN    NaN    NaN  NaN  1.77
370     2021-02-19 14:28:45   AMZN      3328.23         call    43.5  45.80     NaN     AMZN210226C03330000     3330.0          NaN   NaN    NaN    NaN  NaN  1.77
1066    2021-02-19 14:28:55   AMZN      3328.23         call    43.5  45.80     NaN     AMZN210226C03330000     3330.0          NaN   NaN    NaN    NaN  NaN  1.77
1067    2021-02-19 14:28:55   AMZN      3328.23         put     44.5  46.85     NaN     AMZN210226P03330000     3330.0          NaN   NaN    NaN    NaN  NaN  1.77

My goal is to group the date_time, then create a column for put's bid and ask and call's bid and ask.

My expected output would be something like this:

        date_time             ticker    stock_price put_bid   put_ask     call_bid    call_ask    impVol  symbol                     strike_price  delta  vega  gamma  theta  rho  diff
371     2021-02-19 14:28:45   AMZN      3328.23     44.5      46.85       43.5        45.80       NaN     AMZN210226P03330000        3330.0    NaN   NaN    NaN    NaN  NaN  1.77
1066    2021-02-19 14:28:55   AMZN      3328.23     43.5      45.80       44.5        46.85       NaN     AMZN210226C03330000        3330.0    NaN   NaN    NaN    NaN  NaN  1.77

I tried everything I can find for examples, including pivoting such as this:

df=pd.pivot_table(df,index=['date_time','type'],columns=df.groupby(['date_time','type']).cumcount().add(1),values=['market_price'],aggfunc='sum')
df.columns=df.columns.map('{0[0]}{0[1]}'.format) 

I think I'm on the right path, but I just can't figure it out. Any help would be incredibly appreciated.

Why are you trying to use a groupby? pandas.pivot() does the grouping for you.

You haven't provided a reproducible example (hint: please do next time) so I made up some random data to explain a possible solution. Note this is not identical to what you need but it's a starting point:

import numpy as np
import pandas as pd

df = pd.DataFrame()
df['period'] = np.repeat([1,2],2)
df['product'] = 'kiwi'
df['type'] = np.tile(['buy','sell'],2)
df['price'] = np.arange(1,5)

out = pd.pivot_table(df, index =['period','product'], columns = ['type'] , values ='price' )

You need to specify what you want on the left (index), what you want on the top (columns) and which values (values) you want to show for this combination.

Also, are you sure the date time will be the same? What if in the first two rows it's even only one second off - is that possible? And what if the stock price is different between the first and the 2nd row of your table? I don't know your data so no idea if that is possible, but it's something to think about.

Also note that my example does not specify an aggregate function, so it defaults to the mean. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html

To use a pivot table to reorient your data the way you're describing, you'll need to include all columns which vary with type, which in this case includes "symbol" (note the P vs. C in the code):

In [10]: pivoted = df.pivot(
    ...:     index=['date_time', 'ticker', 'stock_price', 'impVol', 'strike_price','delta','vega', 'gamma','theta','rho','diff'],
    ...:     columns=['type', 'symbol'],
    ...:     values=['bid', 'ask'],
    ...: )

In [11]: pivoted
Out[11]: 
                                                                                                           bid                                     ask
type                                                                                                       put                call                 put                call
symbol                                                                                     AMZN210226P03330000 AMZN210226C03330000 AMZN210226P03330000 AMZN210226C03330000
date_time           ticker stock_price impVol strike_price delta vega gamma theta rho diff
2021-02-19 14:28:45 AMZN   3328.23     NaN    3330.0       NaN   NaN  NaN   NaN   NaN 1.77                44.5                43.5               46.85                45.8
2021-02-19 14:28:55 AMZN   3328.23     NaN    3330.0       NaN   NaN  NaN   NaN   NaN 1.77                44.5                43.5               46.85                45.8

If you'd like, you could then relabel your columns:

In [12]: pivoted.columns = pd.Index([i[0] + '_' + i[1] for i in pivoted.columns.values])

In [13]: pivoted
Out[13]:
                                                                                            bid_put  bid_call  ask_put  ask_call
date_time           ticker stock_price impVol strike_price delta vega gamma theta rho diff
2021-02-19 14:28:45 AMZN   3328.23     NaN    3330.0       NaN   NaN  NaN   NaN   NaN 1.77     44.5      43.5    46.85      45.8
2021-02-19 14:28:55 AMZN   3328.23     NaN    3330.0       NaN   NaN  NaN   NaN   NaN 1.77     44.5      43.5    46.85      45.8

Alternatively, you could just exclude symbol from the index, but either way, you need to either stack symbol, drop it, or manually handle it some way because the data is not the same for each "type".

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM