Pandas - 如何将多索引数据框中的列缩放到每个级别= 0组的顶行

Question

I have a multi-index dataframe dfu : 我有一个多索引数据帧dfu ：

                      open   high     low   close
Date       Time
2016-11-28 09:43:00  26.03  26.03  26.030  26.030
           09:48:00  25.90  25.90  25.760  25.760
           09:51:00  26.00  26.00  25.985  25.985
2016-11-29 09:30:00  24.98  24.98  24.98  24.9800
           09:33:00  25.00  25.00  24.99  24.9900
           09:35:00  25.33  25.46  25.33  25.4147

I would like to create a new column, ['closeScaled'] that is calculated by doing a function, foo, using the first row of the current level=0 value from the ['open'] column and the current row['close'] as arguments. 我想创建一个新列，['closeScaled']，通过执行函数foo计算，使用['open']列中当前level = 0值的第一行和当前行['close ']作为参数。 I suspect the solution will involve something looking like: 我怀疑解决方案会涉及到以下内容：

dfu['closeScaled']=dfu.apply(lambda x: foo(*get first row of current date*[0],x[3]))

I just can't seem to figure out the get first row of current level=0 part. 我似乎无法弄清楚当前级别= 0部分的第一行 。

if foo is: 如果foo是：

def foo(firstOpen,currentClose):
    return (currentClose / firstOpen)

then I would expect the closeScaled column to contain (truncating to 4 decimals): 那么我希望closeScaled列包含（截断到4位小数）：

                      open   high     low   close  closeScaled
Date       Time
2016-11-28 09:43:00  26.03  26.03  26.030  26.030  1.0000
           09:48:00  25.90  25.90  25.760  25.760  0.9896
           09:51:00  26.00  26.00  25.985  25.985  0.9982
2016-11-29 09:30:00  24.98  24.98  24.98  24.9800  1.0000
           09:33:00  25.00  25.00  24.99  24.9900  1.0004
           09:35:00  25.33  25.46  25.33  25.4147  1.0174

Answer 1

You can divide by div Series created by groupby with transform first and last round : 您可以使用groupby创建的div Series除以transform first round和最后round ：

print (dfu.groupby(level=0)['open'].transform('first'))
Date        Time    
2016-11-28  09:43:00    26.03
            09:48:00    26.03
            09:51:00    26.03
2016-11-29  09:30:00    24.98
            09:33:00    24.98
            09:35:00    24.98
Name: open, dtype: float64

dfu['closeScaled'] = dfu.close.div(dfu.groupby(level=0)['open'].transform('first')).round(4)
print (dfu)
                      open   high     low    close  closeScaled
Date       Time                                                
2016-11-28 09:43:00  26.03  26.03  26.030  26.0300       1.0000
           09:48:00  25.90  25.90  25.760  25.7600       0.9896
           09:51:00  26.00  26.00  25.985  25.9850       0.9983
2016-11-29 09:30:00  24.98  24.98  24.980  24.9800       1.0000
           09:33:00  25.00  25.00  24.990  24.9900       1.0004
           09:35:00  25.33  25.46  25.330  25.4147       1.0174

If need truncate float values to 4 decimals: 如果需要将浮点值截断为4位小数：

First multiple by 10000 , convert to int and divide by 10000 . 第一个倍数为10000 ，转换为int并除以10000 。

dfu['closeScaled'] = dfu.close.div(dfu.groupby(level=0)['open'].transform('first'))
                              .mul(10000).astype(int).div(10000)
print (dfu)
                      open   high     low    close  closeScaled
Date       Time                                                
2016-11-28 09:43:00  26.03  26.03  26.030  26.0300       1.0000
           09:48:00  25.90  25.90  25.760  25.7600       0.9896
           09:51:00  26.00  26.00  25.985  25.9850       0.9982
2016-11-29 09:30:00  24.98  24.98  24.980  24.9800       1.0000
           09:33:00  25.00  25.00  24.990  24.9900       1.0004
           09:35:00  25.33  25.46  25.330  25.4147       1.0174

#http://stackoverflow.com/a/783927/2901002
def truncate(f, n):
    '''Truncates/pads a float f to n decimal places without rounding'''
    s = '{}'.format(f)
    if 'e' in s or 'E' in s:
        return '{0:.{1}f}'.format(f, n)
    i, p, d = s.partition('.')
    return '.'.join([i, (d+'0'*n)[:n]])

dfu['closeScaled'] = dfu.close.div(dfu.groupby(level=0)['open'].transform('first'))
                        .apply(lambda x: truncate(x,4)).astype(float)
print (dfu)
                      open   high     low    close  closeScaled
Date       Time                                                
2016-11-28 09:43:00  26.03  26.03  26.030  26.0300       1.0000
           09:48:00  25.90  25.90  25.760  25.7600       0.9896
           09:51:00  26.00  26.00  25.985  25.9850       0.9982
2016-11-29 09:30:00  24.98  24.98  24.980  24.9800       1.0000
           09:33:00  25.00  25.00  24.990  24.9900       1.0004
           09:35:00  25.33  25.46  25.330  25.4147       1.0174

Answer 2

Using groupby + apply + lambda 使用groupby + apply + lambda

df.groupby(level=0).apply(
    lambda df: df.assign(closeScaled=df.close.div(df.open.iloc[0]).round(4))
)

                      open   high     low    close  closeScaled
Date       Time                                                
2016-11-28 09:43:00  26.03  26.03  26.030  26.0300       1.0000
           09:48:00  25.90  25.90  25.760  25.7600       0.9896
           09:51:00  26.00  26.00  25.985  25.9850       0.9983
2016-11-29 09:30:00  24.98  24.98  24.980  24.9800       1.0000
           09:33:00  25.00  25.00  24.990  24.9900       1.0004
           09:35:00  25.33  25.46  25.330  25.4147       1.0174

Pandas - 如何将多索引数据框中的列缩放到每个级别= 0组的顶行

问题描述

2 个解决方案

解决方案1
2 已采纳 2017-01-02 06:53:06

解决方案2
2 2017-01-02 08:35:41

Pandas - 如何将多索引数据框中的列缩放到每个级别= 0组的顶行

问题描述

2 个解决方案

解决方案1 2 已采纳 2017-01-02 06:53:06

解决方案2 2 2017-01-02 08:35:41

解决方案1
2 已采纳 2017-01-02 06:53:06

解决方案2
2 2017-01-02 08:35:41