简体   繁体   English

如何在 python 中的 groupby 上运行更复杂的聚合函数

[英]How do you run more complex aggregation functions on groupby in python

Beginner with python here.从这里开始使用 python。 Just using spyder for some finance related work.只是将 spyder 用于一些与金融相关的工作。 I could use some guidance on the below.. I don't even have attempted code as I don't know what to start.我可以在下面使用一些指导。我什至没有尝试过代码,因为我不知道从什么开始。

I have a pandas dataframe that is organised like so我有一个像这样组织的 pandas dataframe

Month, portfolio name, return

month is in format 201501,201502 .月份的格式为201501,201502 return is a %.回报是一个%。 I have several hundred thousand observations.我有几十万个观察结果。

I would like to produce stats at a portfolio name level each morning - total return, return over specified periods, rolling vol numbers, max drawdown over the full period etc.我想每天早上在投资组合名称级别生成统计数据 - 总回报、指定时期的回报、滚动交易量、整个时期的最大回撤等。

At a high level what is the best way to do this?在高层次上,最好的方法是什么? I feel what I need to do is groupby portfolio name and just apply different functions which either exist in some stats or finance package or I can write myself, but for some reason can't find good examples of this in practice.我觉得我需要做的是 groupby 投资组合名称,只需应用某些统计数据或财务 package 中存在的不同功能,或者我可以自己编写,但由于某种原因在实践中找不到很好的例子。 Also, with something like max drawdown over the period, the order of the observations matter - do I need to do something different here when we groupby or will pandas read it in in order as long as it is in datetime?此外,由于在此期间的最大回撤,观察的顺序很重要 - 当我们分组时我是否需要在这里做一些不同的事情,或者 pandas 是否会按顺序读取它,只要它在日期时间?

Again, looking for general advice on the above, and pointers in the right direction.再次,寻找关于上述内容的一般建议,以及正确方向的指针。 Perhaps when I get close I can troubleshoot here with code.也许当我接近时,我可以在这里使用代码进行故障排除。

thanks in advance for any replies.提前感谢您的任何回复。 This is my first post, this site has helped me stay in a job answering excel questions over 6 years+.这是我的第一篇文章,这个网站帮助我在 6 年多的时间里继续回答 excel 问题。

Sample of data数据样本
MONTH,PORTFOLIO_NAME,RETURN MONTH,PORTFOLIO_NAME,返回
201501,PORT1,0.014 201501,PORT1,0.014
201502,PORT1,0.0034 201502,PORT1,0.0034
201503,PORT1,-0.0045 201503,PORT1,-0.0045
201501,PORT2,0.012 201501,PORT2,0.012
201502,PORT2,0.0054 201502,PORT2,0.0054
201503,PORT2,-0.0174 201503,PORT2,-0.0174

EDIT: I learned about the rolling() and expanding() features of pandas and understand it all a lot more now.编辑:我了解了 pandas 的 rolling() 和 expand() 功能,现在对它有了更多的了解。 I hadn't been using AGG() either for custom functions.我也没有将 AGG() 用于自定义函数。 See below how I created an aggregate for each portfolio for a couple of different metrics.请参阅下文,我如何为每个投资组合针对几个不同的指标创建聚合。 Any specific questions I have I'll make a new post for.我有任何具体问题我会发一个新帖子。 Thanks谢谢

    import numpy as np
import pandas as pd

def rolled_ret(arr):
    return arr.add(1).prod() -1
def ann_vol(arr):
    return np.std(arr) * np.sqrt(12)
def max_drawdown(arr):
    return arr.add(1).cumprod().diff().min()

full_return = df_final.groupby('PORTFOLIO_NAME')['RETURN'].agg(full_period_returns=rolled_ret,annualised_vol=ann_vol,MDD=max_drawdown)

Here's how I would approach it:这是我将如何处理它:

# Convert Month to datetime
df["MONTH"] = pd.to_datetime(df["MONTH"], format="%Y%m")

# create a new column with your period
df["PERIOD"] = df["MONTH"].dt.to_period("m")

# Now aggregate per period:
df.groupby(["PORTFOLIO_NAME","PERIOD"]).agg([("total",sum),("min",min),("max",max)])


|                                   |   ('RETURN', 'total') |   ('RETURN', 'min') |   ('RETURN', 'max') |
|:----------------------------------|----------------------:|--------------------:|--------------------:|
| ('PORT1', Period('2015-01', 'M')) |                0.014  |              0.014  |              0.014  |
| ('PORT1', Period('2015-02', 'M')) |                0.0034 |              0.0034 |              0.0034 |
| ('PORT1', Period('2015-03', 'M')) |               -0.0045 |             -0.0045 |             -0.0045 |
| ('PORT2', Period('2015-01', 'M')) |                0.012  |              0.012  |              0.012  |
| ('PORT2', Period('2015-02', 'M')) |                0.0054 |              0.0054 |              0.0054 |
| ('PORT2', Period('2015-03', 'M')) |               -0.0174 |             -0.0174 |             -0.0174 |

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM