简体   繁体   中英

Apply custom cumulative function to pandas dataframe

I have a dataframe sorted by date :

df = pd.DataFrame({'idx': [1, 1, 1, 2, 2, 2],
                   'date': ['2016-04-30', '2016-05-31', '2016-06-31',
                            '2016-04-30', '2016-05-31', '2016-06-31'],
                   'val': [10, 0, 5, 10, 0, 0],
                   'pct_val': [None, -10, None, None, -10, -10]})
df = df.sort('date')
print df

         date  idx  pct_val  val
3  2016-04-30    2      NaN   10
0  2016-04-30    1      NaN   10
4  2016-05-31    2      -10    0
1  2016-05-31    1      -10    0
5  2016-06-31    2      -10    0
2  2016-06-31    1      NaN    5

And I want to group by idx then apply a cumulative function with some simple logic. If pct_val is null, add val to to running total, otherwise multiply running total by 1 + pct_val/100 . 'cumsum' shows the result of df.groupby('idx').val.cumsum() and 'cumulative_func' is the result I want.

         date  idx  pct_val  val  cumsum  cumulative_func
3  2016-04-30    2      NaN   10      10               10
0  2016-04-30    1      NaN   10      10               10
4  2016-05-31    2      -10    0      10                9
1  2016-05-31    1      -10    0      10                9
5  2016-06-31    2      -10    0      10                8
2  2016-06-31    1      NaN    5      15               14

Any idea if there is a way to do apply a custom cumulative function to a dataframe or a better way to achieve this?

I don't believe there is an easy way to accomplish your objective using vectorization. I would first try to get something working, and then optimize for speed if required.

def cumulative_func(df):
    results = []
    for group in df.groupby('idx').groups.itervalues():
        total = 0
        result = []
        for p, v in df.ix[group, ['pct_val', 'val']].values:
            if np.isnan(p):
                total += v
            else:
                total *= (1 + .01 * p)
            result.append(total)
        results.append(pd.Series(result, index=group))
    return pd.concat(results).reindex(df.index)

df['cumulative_func'] = cumulative_func(df)

>>> df
         date  idx  pct_val  val  cumulative_func
3  2016-04-30    2      NaN   10             10.0
0  2016-04-30    1      NaN   10             10.0
4  2016-05-31    2      -10    0              9.0
1  2016-05-31    1      -10    0              9.0
5  2016-06-31    2      -10    0              8.1
2  2016-06-31    1      NaN    5             14.0

First I cleaned up your setup

Setup

df = pd.DataFrame({'idx': [1, 1, 1, 2, 2, 2],
                   'date': ['2016-04-30', '2016-05-31', '2016-06-31',
                            '2016-04-30', '2016-05-31', '2016-06-31'],
                   'val': [10, 0, 5, 10, 0, 0],
                   'pct_val': [None, -10, None, None, -10, -10]})
df = df.sort_values(['date', 'idx'])
print df

Looks like:

         date  idx  pct_val  val
0  2016-04-30    1      NaN   10
3  2016-04-30    2      NaN   10
1  2016-05-31    1    -10.0    0
4  2016-05-31    2    -10.0    0
2  2016-06-31    1      NaN    5
5  2016-06-31    2    -10.0    0

Solution

def cumcustom(df):
    df = df.copy()
    running_total = 0
    for idx, row in df.iterrows():
        if pd.isnull(row.ix['pct_val']):
            running_total += row.ix['val']
        else:
            running_total *= row.ix['pct_val'] / 100. + 1
        df.loc[idx, 'cumcustom'] = running_total
    return df

Then apply

df.groupby('idx').apply(cumcustom).reset_index(drop=True).sort_values(['date', 'idx'])

Looks like:

         date  idx  pct_val  val  cumcustom
0  2016-04-30    1      NaN   10       10.0
3  2016-04-30    2      NaN   10       10.0
1  2016-05-31    1    -10.0    0        9.0
4  2016-05-31    2    -10.0    0        9.0
2  2016-06-31    1      NaN    5       14.0
5  2016-06-31    2    -10.0    0        8.1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM