[英]Pandas: How to groupby a dataframe and convert the rows to columns and consolidate the rows
這是我的數據結構:
date_time ticker stock_price type bid ask impVol symbol strike_price delta vega gamma theta rho diff
371 2021-02-19 14:28:45 AMZN 3328.23 put 44.5 46.85 NaN AMZN210226P03330000 3330.0 NaN NaN NaN NaN NaN 1.77
370 2021-02-19 14:28:45 AMZN 3328.23 call 43.5 45.80 NaN AMZN210226C03330000 3330.0 NaN NaN NaN NaN NaN 1.77
1066 2021-02-19 14:28:55 AMZN 3328.23 call 43.5 45.80 NaN AMZN210226C03330000 3330.0 NaN NaN NaN NaN NaN 1.77
1067 2021-02-19 14:28:55 AMZN 3328.23 put 44.5 46.85 NaN AMZN210226P03330000 3330.0 NaN NaN NaN NaN NaN 1.77
我的目標是對 date_time 進行分組,然后為看跌期權的出價和要價以及看漲的出價和要價創建一個列。
我預期的 output 會是這樣的:
date_time ticker stock_price put_bid put_ask call_bid call_ask impVol symbol strike_price delta vega gamma theta rho diff
371 2021-02-19 14:28:45 AMZN 3328.23 44.5 46.85 43.5 45.80 NaN AMZN210226P03330000 3330.0 NaN NaN NaN NaN NaN 1.77
1066 2021-02-19 14:28:55 AMZN 3328.23 43.5 45.80 44.5 46.85 NaN AMZN210226C03330000 3330.0 NaN NaN NaN NaN NaN 1.77
我嘗試了所有我能找到的例子,包括這樣的旋轉:
df=pd.pivot_table(df,index=['date_time','type'],columns=df.groupby(['date_time','type']).cumcount().add(1),values=['market_price'],aggfunc='sum')
df.columns=df.columns.map('{0[0]}{0[1]}'.format)
我認為我走在正確的道路上,但我就是想不通。 任何幫助將不勝感激。
為什么要嘗試使用 groupby? pandas.pivot()
為您進行分組。
您沒有提供可重現的示例(提示:請下次再做),所以我編造了一些隨機數據來解釋可能的解決方案。 請注意,這與您需要的不同,但它是一個起點:
import numpy as np
import pandas as pd
df = pd.DataFrame()
df['period'] = np.repeat([1,2],2)
df['product'] = 'kiwi'
df['type'] = np.tile(['buy','sell'],2)
df['price'] = np.arange(1,5)
out = pd.pivot_table(df, index =['period','product'], columns = ['type'] , values ='price' )
您需要在左側(索引)指定您想要的內容,在頂部(列)想要的內容以及要為此組合顯示哪些值(值)。
另外,您確定日期時間會相同嗎? 如果在前兩行中它甚至只有一秒鍾的時間 - 這可能嗎? 如果表格的第一行和第二行的股票價格不同怎么辦? 我不知道你的數據,所以不知道這是否可能,但這是需要考慮的事情。
另請注意,我的示例未指定聚合 function,因此默認為平均值。 https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html
要使用 pivot 表以您描述的方式重新定向您的數據,您需要包括所有隨類型變化的列,在這種情況下包括“符號”(注意代碼中的 P 與 C):
In [10]: pivoted = df.pivot(
...: index=['date_time', 'ticker', 'stock_price', 'impVol', 'strike_price','delta','vega', 'gamma','theta','rho','diff'],
...: columns=['type', 'symbol'],
...: values=['bid', 'ask'],
...: )
In [11]: pivoted
Out[11]:
bid ask
type put call put call
symbol AMZN210226P03330000 AMZN210226C03330000 AMZN210226P03330000 AMZN210226C03330000
date_time ticker stock_price impVol strike_price delta vega gamma theta rho diff
2021-02-19 14:28:45 AMZN 3328.23 NaN 3330.0 NaN NaN NaN NaN NaN 1.77 44.5 43.5 46.85 45.8
2021-02-19 14:28:55 AMZN 3328.23 NaN 3330.0 NaN NaN NaN NaN NaN 1.77 44.5 43.5 46.85 45.8
如果你願意,你可以重新標記你的列:
In [12]: pivoted.columns = pd.Index([i[0] + '_' + i[1] for i in pivoted.columns.values])
In [13]: pivoted
Out[13]:
bid_put bid_call ask_put ask_call
date_time ticker stock_price impVol strike_price delta vega gamma theta rho diff
2021-02-19 14:28:45 AMZN 3328.23 NaN 3330.0 NaN NaN NaN NaN NaN 1.77 44.5 43.5 46.85 45.8
2021-02-19 14:28:55 AMZN 3328.23 NaN 3330.0 NaN NaN NaN NaN NaN 1.77 44.5 43.5 46.85 45.8
或者,您可以只從索引中排除符號,但無論哪種方式,您都需要堆疊符號、刪除它或以某種方式手動處理它,因為每種“類型”的數據都不相同。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.