簡體   English   中英

Pandas:如何按 dataframe 分組並將行轉換為列並合並行

[英]Pandas: How to groupby a dataframe and convert the rows to columns and consolidate the rows

這是我的數據結構:

        date_time             ticker    stock_price     type    bid   ask       impVol               symbol     strike_price  delta  vega  gamma  theta  rho  diff
371     2021-02-19 14:28:45   AMZN      3328.23         put     44.5  46.85     NaN     AMZN210226P03330000     3330.0          NaN   NaN    NaN    NaN  NaN  1.77
370     2021-02-19 14:28:45   AMZN      3328.23         call    43.5  45.80     NaN     AMZN210226C03330000     3330.0          NaN   NaN    NaN    NaN  NaN  1.77
1066    2021-02-19 14:28:55   AMZN      3328.23         call    43.5  45.80     NaN     AMZN210226C03330000     3330.0          NaN   NaN    NaN    NaN  NaN  1.77
1067    2021-02-19 14:28:55   AMZN      3328.23         put     44.5  46.85     NaN     AMZN210226P03330000     3330.0          NaN   NaN    NaN    NaN  NaN  1.77

我的目標是對 date_time 進行分組,然后為看跌期權的出價和要價以及看漲的出價和要價創建一個列。

我預期的 output 會是這樣的:

        date_time             ticker    stock_price put_bid   put_ask     call_bid    call_ask    impVol  symbol                     strike_price  delta  vega  gamma  theta  rho  diff
371     2021-02-19 14:28:45   AMZN      3328.23     44.5      46.85       43.5        45.80       NaN     AMZN210226P03330000        3330.0    NaN   NaN    NaN    NaN  NaN  1.77
1066    2021-02-19 14:28:55   AMZN      3328.23     43.5      45.80       44.5        46.85       NaN     AMZN210226C03330000        3330.0    NaN   NaN    NaN    NaN  NaN  1.77

我嘗試了所有我能找到的例子,包括這樣的旋轉:

df=pd.pivot_table(df,index=['date_time','type'],columns=df.groupby(['date_time','type']).cumcount().add(1),values=['market_price'],aggfunc='sum')
df.columns=df.columns.map('{0[0]}{0[1]}'.format) 

我認為我走在正確的道路上,但我就是想不通。 任何幫助將不勝感激。

為什么要嘗試使用 groupby? pandas.pivot()為您進行分組。

您沒有提供可重現的示例(提示:請下次再做),所以我編造了一些隨機數據來解釋可能的解決方案。 請注意,這與您需要的不同,但它是一個起點:

import numpy as np
import pandas as pd

df = pd.DataFrame()
df['period'] = np.repeat([1,2],2)
df['product'] = 'kiwi'
df['type'] = np.tile(['buy','sell'],2)
df['price'] = np.arange(1,5)

out = pd.pivot_table(df, index =['period','product'], columns = ['type'] , values ='price' )

您需要在左側(索引)指定您想要的內容,在頂部(列)想要的內容以及要為此組合顯示哪些值(值)。

另外,您確定日期時間會相同嗎? 如果在前兩行中它甚至只有一秒鍾的時間 - 這可能嗎? 如果表格的第一行和第二行的股票價格不同怎么辦? 我不知道你的數據,所以不知道這是否可能,但這是需要考慮的事情。

另請注意,我的示例未指定聚合 function,因此默認為平均值。 https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html

要使用 pivot 表以您描述的方式重新定向您的數據,您需要包括所有隨類型變化的列,在這種情況下包括“符號”(注意代碼中的 P 與 C):

In [10]: pivoted = df.pivot(
    ...:     index=['date_time', 'ticker', 'stock_price', 'impVol', 'strike_price','delta','vega', 'gamma','theta','rho','diff'],
    ...:     columns=['type', 'symbol'],
    ...:     values=['bid', 'ask'],
    ...: )

In [11]: pivoted
Out[11]: 
                                                                                                           bid                                     ask
type                                                                                                       put                call                 put                call
symbol                                                                                     AMZN210226P03330000 AMZN210226C03330000 AMZN210226P03330000 AMZN210226C03330000
date_time           ticker stock_price impVol strike_price delta vega gamma theta rho diff
2021-02-19 14:28:45 AMZN   3328.23     NaN    3330.0       NaN   NaN  NaN   NaN   NaN 1.77                44.5                43.5               46.85                45.8
2021-02-19 14:28:55 AMZN   3328.23     NaN    3330.0       NaN   NaN  NaN   NaN   NaN 1.77                44.5                43.5               46.85                45.8

如果你願意,你可以重新標記你的列:

In [12]: pivoted.columns = pd.Index([i[0] + '_' + i[1] for i in pivoted.columns.values])

In [13]: pivoted
Out[13]:
                                                                                            bid_put  bid_call  ask_put  ask_call
date_time           ticker stock_price impVol strike_price delta vega gamma theta rho diff
2021-02-19 14:28:45 AMZN   3328.23     NaN    3330.0       NaN   NaN  NaN   NaN   NaN 1.77     44.5      43.5    46.85      45.8
2021-02-19 14:28:55 AMZN   3328.23     NaN    3330.0       NaN   NaN  NaN   NaN   NaN 1.77     44.5      43.5    46.85      45.8

或者,您可以只從索引中排除符號,但無論哪種方式,您都需要堆疊符號、刪除它或以某種方式手動處理它,因為每種“類型”的數據都不相同。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM