简体   繁体   English

Pandas:如何按 dataframe 分组并将行转换为列并合并行

[英]Pandas: How to groupby a dataframe and convert the rows to columns and consolidate the rows

Here's my data structure:这是我的数据结构:

        date_time             ticker    stock_price     type    bid   ask       impVol               symbol     strike_price  delta  vega  gamma  theta  rho  diff
371     2021-02-19 14:28:45   AMZN      3328.23         put     44.5  46.85     NaN     AMZN210226P03330000     3330.0          NaN   NaN    NaN    NaN  NaN  1.77
370     2021-02-19 14:28:45   AMZN      3328.23         call    43.5  45.80     NaN     AMZN210226C03330000     3330.0          NaN   NaN    NaN    NaN  NaN  1.77
1066    2021-02-19 14:28:55   AMZN      3328.23         call    43.5  45.80     NaN     AMZN210226C03330000     3330.0          NaN   NaN    NaN    NaN  NaN  1.77
1067    2021-02-19 14:28:55   AMZN      3328.23         put     44.5  46.85     NaN     AMZN210226P03330000     3330.0          NaN   NaN    NaN    NaN  NaN  1.77

My goal is to group the date_time, then create a column for put's bid and ask and call's bid and ask.我的目标是对 date_time 进行分组,然后为看跌期权的出价和要价以及看涨的出价和要价创建一个列。

My expected output would be something like this:我预期的 output 会是这样的:

        date_time             ticker    stock_price put_bid   put_ask     call_bid    call_ask    impVol  symbol                     strike_price  delta  vega  gamma  theta  rho  diff
371     2021-02-19 14:28:45   AMZN      3328.23     44.5      46.85       43.5        45.80       NaN     AMZN210226P03330000        3330.0    NaN   NaN    NaN    NaN  NaN  1.77
1066    2021-02-19 14:28:55   AMZN      3328.23     43.5      45.80       44.5        46.85       NaN     AMZN210226C03330000        3330.0    NaN   NaN    NaN    NaN  NaN  1.77

I tried everything I can find for examples, including pivoting such as this:我尝试了所有我能找到的例子,包括这样的旋转:

df=pd.pivot_table(df,index=['date_time','type'],columns=df.groupby(['date_time','type']).cumcount().add(1),values=['market_price'],aggfunc='sum')
df.columns=df.columns.map('{0[0]}{0[1]}'.format) 

I think I'm on the right path, but I just can't figure it out.我认为我走在正确的道路上,但我就是想不通。 Any help would be incredibly appreciated.任何帮助将不胜感激。

Why are you trying to use a groupby?为什么要尝试使用 groupby? pandas.pivot() does the grouping for you. pandas.pivot()为您进行分组。

You haven't provided a reproducible example (hint: please do next time) so I made up some random data to explain a possible solution.您没有提供可重现的示例(提示:请下次再做),所以我编造了一些随机数据来解释可能的解决方案。 Note this is not identical to what you need but it's a starting point:请注意,这与您需要的不同,但它是一个起点:

import numpy as np
import pandas as pd

df = pd.DataFrame()
df['period'] = np.repeat([1,2],2)
df['product'] = 'kiwi'
df['type'] = np.tile(['buy','sell'],2)
df['price'] = np.arange(1,5)

out = pd.pivot_table(df, index =['period','product'], columns = ['type'] , values ='price' )

You need to specify what you want on the left (index), what you want on the top (columns) and which values (values) you want to show for this combination.您需要在左侧(索引)指定您想要的内容,在顶部(列)想要的内容以及要为此组合显示哪些值(值)。

Also, are you sure the date time will be the same?另外,您确定日期时间会相同吗? What if in the first two rows it's even only one second off - is that possible?如果在前两行中它甚至只有一秒钟的时间 - 这可能吗? And what if the stock price is different between the first and the 2nd row of your table?如果表格的第一行和第二行的股票价格不同怎么办? I don't know your data so no idea if that is possible, but it's something to think about.我不知道你的数据,所以不知道这是否可能,但这是需要考虑的事情。

Also note that my example does not specify an aggregate function, so it defaults to the mean.另请注意,我的示例未指定聚合 function,因此默认为平均值。 https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html

To use a pivot table to reorient your data the way you're describing, you'll need to include all columns which vary with type, which in this case includes "symbol" (note the P vs. C in the code):要使用 pivot 表以您描述的方式重新定向您的数据,您需要包括所有随类型变化的列,在这种情况下包括“符号”(注意代码中的 P 与 C):

In [10]: pivoted = df.pivot(
    ...:     index=['date_time', 'ticker', 'stock_price', 'impVol', 'strike_price','delta','vega', 'gamma','theta','rho','diff'],
    ...:     columns=['type', 'symbol'],
    ...:     values=['bid', 'ask'],
    ...: )

In [11]: pivoted
Out[11]: 
                                                                                                           bid                                     ask
type                                                                                                       put                call                 put                call
symbol                                                                                     AMZN210226P03330000 AMZN210226C03330000 AMZN210226P03330000 AMZN210226C03330000
date_time           ticker stock_price impVol strike_price delta vega gamma theta rho diff
2021-02-19 14:28:45 AMZN   3328.23     NaN    3330.0       NaN   NaN  NaN   NaN   NaN 1.77                44.5                43.5               46.85                45.8
2021-02-19 14:28:55 AMZN   3328.23     NaN    3330.0       NaN   NaN  NaN   NaN   NaN 1.77                44.5                43.5               46.85                45.8

If you'd like, you could then relabel your columns:如果你愿意,你可以重新标记你的列:

In [12]: pivoted.columns = pd.Index([i[0] + '_' + i[1] for i in pivoted.columns.values])

In [13]: pivoted
Out[13]:
                                                                                            bid_put  bid_call  ask_put  ask_call
date_time           ticker stock_price impVol strike_price delta vega gamma theta rho diff
2021-02-19 14:28:45 AMZN   3328.23     NaN    3330.0       NaN   NaN  NaN   NaN   NaN 1.77     44.5      43.5    46.85      45.8
2021-02-19 14:28:55 AMZN   3328.23     NaN    3330.0       NaN   NaN  NaN   NaN   NaN 1.77     44.5      43.5    46.85      45.8

Alternatively, you could just exclude symbol from the index, but either way, you need to either stack symbol, drop it, or manually handle it some way because the data is not the same for each "type".或者,您可以只从索引中排除符号,但无论哪种方式,您都需要堆叠符号、删除它或以某种方式手动处理它,因为每种“类型”的数据都不相同。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM