简体   繁体   中英

How do you run more complex aggregation functions on groupby in python

Beginner with python here. Just using spyder for some finance related work. I could use some guidance on the below.. I don't even have attempted code as I don't know what to start.

I have a pandas dataframe that is organised like so

Month, portfolio name, return

month is in format 201501,201502 . return is a %. I have several hundred thousand observations.

I would like to produce stats at a portfolio name level each morning - total return, return over specified periods, rolling vol numbers, max drawdown over the full period etc.

At a high level what is the best way to do this? I feel what I need to do is groupby portfolio name and just apply different functions which either exist in some stats or finance package or I can write myself, but for some reason can't find good examples of this in practice. Also, with something like max drawdown over the period, the order of the observations matter - do I need to do something different here when we groupby or will pandas read it in in order as long as it is in datetime?

Again, looking for general advice on the above, and pointers in the right direction. Perhaps when I get close I can troubleshoot here with code.

thanks in advance for any replies. This is my first post, this site has helped me stay in a job answering excel questions over 6 years+.

Sample of data
MONTH,PORTFOLIO_NAME,RETURN
201501,PORT1,0.014
201502,PORT1,0.0034
201503,PORT1,-0.0045
201501,PORT2,0.012
201502,PORT2,0.0054
201503,PORT2,-0.0174

EDIT: I learned about the rolling() and expanding() features of pandas and understand it all a lot more now. I hadn't been using AGG() either for custom functions. See below how I created an aggregate for each portfolio for a couple of different metrics. Any specific questions I have I'll make a new post for. Thanks

    import numpy as np
import pandas as pd

def rolled_ret(arr):
    return arr.add(1).prod() -1
def ann_vol(arr):
    return np.std(arr) * np.sqrt(12)
def max_drawdown(arr):
    return arr.add(1).cumprod().diff().min()

full_return = df_final.groupby('PORTFOLIO_NAME')['RETURN'].agg(full_period_returns=rolled_ret,annualised_vol=ann_vol,MDD=max_drawdown)

Here's how I would approach it:

# Convert Month to datetime
df["MONTH"] = pd.to_datetime(df["MONTH"], format="%Y%m")

# create a new column with your period
df["PERIOD"] = df["MONTH"].dt.to_period("m")

# Now aggregate per period:
df.groupby(["PORTFOLIO_NAME","PERIOD"]).agg([("total",sum),("min",min),("max",max)])


|                                   |   ('RETURN', 'total') |   ('RETURN', 'min') |   ('RETURN', 'max') |
|:----------------------------------|----------------------:|--------------------:|--------------------:|
| ('PORT1', Period('2015-01', 'M')) |                0.014  |              0.014  |              0.014  |
| ('PORT1', Period('2015-02', 'M')) |                0.0034 |              0.0034 |              0.0034 |
| ('PORT1', Period('2015-03', 'M')) |               -0.0045 |             -0.0045 |             -0.0045 |
| ('PORT2', Period('2015-01', 'M')) |                0.012  |              0.012  |              0.012  |
| ('PORT2', Period('2015-02', 'M')) |                0.0054 |              0.0054 |              0.0054 |
| ('PORT2', Period('2015-03', 'M')) |               -0.0174 |             -0.0174 |             -0.0174 |

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM