简体   繁体   English

Pandas - 如何将多索引数据框中的列缩放到每个级别= 0组的顶行

[英]Pandas - How to scale a column in a multi-index dataframe to the top row in each level=0 group

I have a multi-index dataframe dfu : 我有一个多索引数据帧dfu

                      open   high     low   close
Date       Time
2016-11-28 09:43:00  26.03  26.03  26.030  26.030
           09:48:00  25.90  25.90  25.760  25.760
           09:51:00  26.00  26.00  25.985  25.985
2016-11-29 09:30:00  24.98  24.98  24.98  24.9800
           09:33:00  25.00  25.00  24.99  24.9900
           09:35:00  25.33  25.46  25.33  25.4147

I would like to create a new column, ['closeScaled'] that is calculated by doing a function, foo, using the first row of the current level=0 value from the ['open'] column and the current row['close'] as arguments. 我想创建一个新列,['closeScaled'],通过执行函数foo计算,使用['open']列中当前level = 0值的第一行和当前行['close ']作为参数。 I suspect the solution will involve something looking like: 我怀疑解决方案会涉及到以下内容:

dfu['closeScaled']=dfu.apply(lambda x: foo(*get first row of current date*[0],x[3]))

I just can't seem to figure out the get first row of current level=0 part. 我似乎无法弄清楚当前级别= 0部分的第一行

if foo is: 如果foo是:

def foo(firstOpen,currentClose):
    return (currentClose / firstOpen)

then I would expect the closeScaled column to contain (truncating to 4 decimals): 那么我希望closeScaled列包含(截断到4位小数):

                      open   high     low   close  closeScaled
Date       Time
2016-11-28 09:43:00  26.03  26.03  26.030  26.030  1.0000
           09:48:00  25.90  25.90  25.760  25.760  0.9896
           09:51:00  26.00  26.00  25.985  25.985  0.9982
2016-11-29 09:30:00  24.98  24.98  24.98  24.9800  1.0000
           09:33:00  25.00  25.00  24.99  24.9900  1.0004
           09:35:00  25.33  25.46  25.33  25.4147  1.0174

You can divide by div Series created by groupby with transform first and last round : 您可以使用groupby创建的div Series除以transform first round和最后round

print (dfu.groupby(level=0)['open'].transform('first'))
Date        Time    
2016-11-28  09:43:00    26.03
            09:48:00    26.03
            09:51:00    26.03
2016-11-29  09:30:00    24.98
            09:33:00    24.98
            09:35:00    24.98
Name: open, dtype: float64

dfu['closeScaled'] = dfu.close.div(dfu.groupby(level=0)['open'].transform('first')).round(4)
print (dfu)
                      open   high     low    close  closeScaled
Date       Time                                                
2016-11-28 09:43:00  26.03  26.03  26.030  26.0300       1.0000
           09:48:00  25.90  25.90  25.760  25.7600       0.9896
           09:51:00  26.00  26.00  25.985  25.9850       0.9983
2016-11-29 09:30:00  24.98  24.98  24.980  24.9800       1.0000
           09:33:00  25.00  25.00  24.990  24.9900       1.0004
           09:35:00  25.33  25.46  25.330  25.4147       1.0174

If need truncate float values to 4 decimals: 如果需要将浮点值截断为4位小数:

First multiple by 10000 , convert to int and divide by 10000 . 第一个倍数为10000 ,转换为int并除以10000

dfu['closeScaled'] = dfu.close.div(dfu.groupby(level=0)['open'].transform('first'))
                              .mul(10000).astype(int).div(10000)
print (dfu)
                      open   high     low    close  closeScaled
Date       Time                                                
2016-11-28 09:43:00  26.03  26.03  26.030  26.0300       1.0000
           09:48:00  25.90  25.90  25.760  25.7600       0.9896
           09:51:00  26.00  26.00  25.985  25.9850       0.9982
2016-11-29 09:30:00  24.98  24.98  24.980  24.9800       1.0000
           09:33:00  25.00  25.00  24.990  24.9900       1.0004
           09:35:00  25.33  25.46  25.330  25.4147       1.0174

#http://stackoverflow.com/a/783927/2901002
def truncate(f, n):
    '''Truncates/pads a float f to n decimal places without rounding'''
    s = '{}'.format(f)
    if 'e' in s or 'E' in s:
        return '{0:.{1}f}'.format(f, n)
    i, p, d = s.partition('.')
    return '.'.join([i, (d+'0'*n)[:n]])

dfu['closeScaled'] = dfu.close.div(dfu.groupby(level=0)['open'].transform('first'))
                        .apply(lambda x: truncate(x,4)).astype(float)
print (dfu)
                      open   high     low    close  closeScaled
Date       Time                                                
2016-11-28 09:43:00  26.03  26.03  26.030  26.0300       1.0000
           09:48:00  25.90  25.90  25.760  25.7600       0.9896
           09:51:00  26.00  26.00  25.985  25.9850       0.9982
2016-11-29 09:30:00  24.98  24.98  24.980  24.9800       1.0000
           09:33:00  25.00  25.00  24.990  24.9900       1.0004
           09:35:00  25.33  25.46  25.330  25.4147       1.0174

Using groupby + apply + lambda 使用groupby + apply + lambda

df.groupby(level=0).apply(
    lambda df: df.assign(closeScaled=df.close.div(df.open.iloc[0]).round(4))
)

                      open   high     low    close  closeScaled
Date       Time                                                
2016-11-28 09:43:00  26.03  26.03  26.030  26.0300       1.0000
           09:48:00  25.90  25.90  25.760  25.7600       0.9896
           09:51:00  26.00  26.00  25.985  25.9850       0.9983
2016-11-29 09:30:00  24.98  24.98  24.980  24.9800       1.0000
           09:33:00  25.00  25.00  24.990  24.9900       1.0004
           09:35:00  25.33  25.46  25.330  25.4147       1.0174

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas:在多索引列数据框中访问不同顶级列索引下的多列 - Pandas: Accessing multiple columns under different top level column index in Multi-index columns Dataframe 如何使用一个顶级列对多索引熊猫数据框进行排序? - How to sort multi-index pandas data frame using one top level column? 在多索引pandas DataFrame上选择一列 - Selecting a column on a multi-index pandas DataFrame 如何基于多索引列值创建熊猫数据框 - How to create pandas dataframe based on multi-index column values 熊猫:通过索引级别0将多索引DataFrame与DataFrame一起分配 - Pandas: Assign multi-index DataFrame with with DataFrame by index-level-0 如何为多索引熊猫数据帧重新编制索引? - How to reindex a multi-index pandas dataframe? 使用级别获取多索引Pandas DataFrame的最小索引 - Get index of the minimum of multi-index Pandas DataFrame using level 汇总数据框每一行的列,并在多级索引熊猫数据框中添加新列 - Sum columns for each row of dataframe, and add new column in multi level index pandas dataframe 迭代多索引pandas中level = 1中的每个索引项 - iterate through each index item in level=1 in multi-index pandas 熊猫:通过重复每行索引n次,将mxn多索引数据框折叠为一系列 - Pandas: Collapse a mxn multi-index dataframe into a series by repeating each row index n times
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM