简体   繁体   English

将自定义累积函数应用于pandas数据帧

[英]Apply custom cumulative function to pandas dataframe

I have a dataframe sorted by date : 我有一个按date排序的数据框:

df = pd.DataFrame({'idx': [1, 1, 1, 2, 2, 2],
                   'date': ['2016-04-30', '2016-05-31', '2016-06-31',
                            '2016-04-30', '2016-05-31', '2016-06-31'],
                   'val': [10, 0, 5, 10, 0, 0],
                   'pct_val': [None, -10, None, None, -10, -10]})
df = df.sort('date')
print df

         date  idx  pct_val  val
3  2016-04-30    2      NaN   10
0  2016-04-30    1      NaN   10
4  2016-05-31    2      -10    0
1  2016-05-31    1      -10    0
5  2016-06-31    2      -10    0
2  2016-06-31    1      NaN    5

And I want to group by idx then apply a cumulative function with some simple logic. 我想按idx分组,然后用一些简单的逻辑应用累积函数。 If pct_val is null, add val to to running total, otherwise multiply running total by 1 + pct_val/100 . 如果pct_val为null,则将val添加到运行总计,否则将运行总计乘以1 + pct_val/100 'cumsum' shows the result of df.groupby('idx').val.cumsum() and 'cumulative_func' is the result I want. 'cumsum'显示df.groupby('idx').val.cumsum()的结果df.groupby('idx').val.cumsum()'cumulative_func'是我想要的结果。

         date  idx  pct_val  val  cumsum  cumulative_func
3  2016-04-30    2      NaN   10      10               10
0  2016-04-30    1      NaN   10      10               10
4  2016-05-31    2      -10    0      10                9
1  2016-05-31    1      -10    0      10                9
5  2016-06-31    2      -10    0      10                8
2  2016-06-31    1      NaN    5      15               14

Any idea if there is a way to do apply a custom cumulative function to a dataframe or a better way to achieve this? 知道是否有办法将自定义累积函数应用于数据框或更好的方法来实现这一点?

I don't believe there is an easy way to accomplish your objective using vectorization. 我不相信有一种简单的方法可以使用矢量化来实现您的目标。 I would first try to get something working, and then optimize for speed if required. 我会先尝试一些工作,然后根据需要优化速度。

def cumulative_func(df):
    results = []
    for group in df.groupby('idx').groups.itervalues():
        total = 0
        result = []
        for p, v in df.ix[group, ['pct_val', 'val']].values:
            if np.isnan(p):
                total += v
            else:
                total *= (1 + .01 * p)
            result.append(total)
        results.append(pd.Series(result, index=group))
    return pd.concat(results).reindex(df.index)

df['cumulative_func'] = cumulative_func(df)

>>> df
         date  idx  pct_val  val  cumulative_func
3  2016-04-30    2      NaN   10             10.0
0  2016-04-30    1      NaN   10             10.0
4  2016-05-31    2      -10    0              9.0
1  2016-05-31    1      -10    0              9.0
5  2016-06-31    2      -10    0              8.1
2  2016-06-31    1      NaN    5             14.0

First I cleaned up your setup 首先,我清理了你的设置

Setup 设定

df = pd.DataFrame({'idx': [1, 1, 1, 2, 2, 2],
                   'date': ['2016-04-30', '2016-05-31', '2016-06-31',
                            '2016-04-30', '2016-05-31', '2016-06-31'],
                   'val': [10, 0, 5, 10, 0, 0],
                   'pct_val': [None, -10, None, None, -10, -10]})
df = df.sort_values(['date', 'idx'])
print df

Looks like: 好像:

         date  idx  pct_val  val
0  2016-04-30    1      NaN   10
3  2016-04-30    2      NaN   10
1  2016-05-31    1    -10.0    0
4  2016-05-31    2    -10.0    0
2  2016-06-31    1      NaN    5
5  2016-06-31    2    -10.0    0

Solution

def cumcustom(df):
    df = df.copy()
    running_total = 0
    for idx, row in df.iterrows():
        if pd.isnull(row.ix['pct_val']):
            running_total += row.ix['val']
        else:
            running_total *= row.ix['pct_val'] / 100. + 1
        df.loc[idx, 'cumcustom'] = running_total
    return df

Then apply 然后申请

df.groupby('idx').apply(cumcustom).reset_index(drop=True).sort_values(['date', 'idx'])

Looks like: 好像:

         date  idx  pct_val  val  cumcustom
0  2016-04-30    1      NaN   10       10.0
3  2016-04-30    2      NaN   10       10.0
1  2016-05-31    1    -10.0    0        9.0
4  2016-05-31    2    -10.0    0        9.0
2  2016-06-31    1      NaN    5       14.0
5  2016-06-31    2    -10.0    0        8.1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何应用累积 function pandas dataframe 但重新启动累积 ZC1C425268E68385F1AB44 不同列 C7A 时的变化? - How to apply a cumulative function pandas dataframe but restart the cumulative function when a different column changes? 应用自定义累积 Function 到 Pandas - Apply Customize Cumulative Function to Pandas 有没有办法使用 pandas dataframe 的 groupby 子句编写自定义累积聚合 function? - Is there a way to write a custom cumulative aggregate function with groupby clause for pandas dataframe? 在大熊猫数据帧应用在列的自定义函数 - Apply a custom function on columns in a pandas dataframe 将自定义 function 应用于 pandas dataframe 值 - apply custom function to a pandas dataframe values pandas DataFrame.groupby并应用自定义函数 - pandas DataFrame.groupby and apply custom function 在滚动窗口上的pandas数据框上应用自定义功能 - apply custom function on pandas dataframe on a rolling window 对熊猫系列的累积值应用函数 - Apply function on cumulative values of pandas series 将自定义 function 应用于 pandas DataFrame 返回 'DataFrame' ZA8CFDE6331BD59EB4B666F8911 - Apply custom function to pandas DataFrame returns 'DataFrame' object is not callable Pandas 在 DataFrame 中对 Group By 的自定义累积计算 - Pandas Custom Cumulative Calculation Over Group By in DataFrame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM