简体   繁体   English

使用 pandas 变换的多个函数

[英]Using multiple functions with pandas transform

I have a dataset that looks like this:我有一个如下所示的数据集:

   entity_id transaction_date transaction_month  net_flow    inflow   outflow
0         51       2018-07-02        2018-07-01  10161.06  20161.06  10000.00
1         51       2018-07-03        2018-07-01   5823.73   5867.37     43.64
2         51       2018-07-05        2018-07-01  17835.79  24107.29   6271.50
3         51       2018-07-06        2018-07-01  -3544.72  31782.84  35327.56
4         51       2018-07-09        2018-07-01  18252.42  18332.42     80.00

I am trying to calculate the rolling metrics across the entity_id field using rolling and transform .我正在尝试使用rollingtransform计算entity_id字段的滚动指标。 I have multiple variables I'd like to create and would prefer to run them in a single call.我有多个要创建的变量,并且希望在一次调用中运行它们。

For example, if I were to create these measures using agg , I would execute something like this:例如,如果我要使用agg创建这些度量,我会执行如下操作:

transactions = (
    raw_transactions
    .groupby(['entity_id','transaction_month'])[['inflow','outflow']]
    .agg([
        'sum','skew',
        ( 'coef_var', lambda x: x.std() / x.mean() ),
        ( 'kurtosis', lambda x: x.kurtosis() )
        ])
    .reset_index()
)

However, I'm unable to reproduce this using transform .但是,我无法使用transform重现这一点。 When I try to pass functions using either a dict or list, I get a TypeError due to list or dict being unhashable.当我尝试使用 dict 或 list 传递函数时,由于 list 或 dict 不可散列,我得到一个 TypeError。

>>> transactions.groupby(['entity_id'])[['inflow','outflow']].transform(['skew','mean'])

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-62-4ef49d836b3f> in <module>
----> 1 transactions.groupby(['entity_id'])[['inflow','outflow']].transform(['skew','mean'])

/jupyter/packages/pandas/core/groupby/generic.py in transform(self, func, engine, engine_kwargs, *args, **kwargs)
   1354 
   1355         # optimized transforms
-> 1356         func = self._get_cython_func(func) or func
   1357 
   1358         if not isinstance(func, str):

/jupyter/packages/pandas/core/base.py in _get_cython_func(self, arg)
    335         if we define an internal function for this argument, return it
    336         """
--> 337         return self._cython_table.get(arg)
    338 
    339     def _is_builtin_func(self, arg):

TypeError: unhashable type: 'list'

I don't think it is possible with transform .我认为transform不可能。 You have two workarounds (at least).您有两种解决方法(至少)。 Either merge the result of groupby.agg on the original dataframe:在原始groupby.aggmerge groupby.agg 的结果:

tmp_ = (
    raw_transactions
    .groupby(['entity_id','transaction_month'])[['inflow','outflow']]
    .agg([
        'sum','skew',
        ( 'coef_var', lambda x: x.std() / x.mean() ),
        ( 'kurtosis', lambda x: x.kurtosis() )
        ]) #no reset_index here
)
# need to flatten multiindex columns
tmp_.columns = ['_'.join(cols) for cols in tmp_.columns] 

# then merge with original dataframe
res = raw_transactions.merge(tmp_, on=['entity_id','transaction_month'])

or use a list comprehension over the different function to transform in a concat with the original data或对不同的concat使用列表理解来转换原始数据

# group once
gr = raw_transactions.groupby(['entity_id'])[['inflow','outflow']]

#concat each dataframe of transformed function with otiginal data
res = pd.concat([raw_transactions] + 
                [gr.transform(func) 
                 for func in ('skew', 'mean', lambda x: x.std() / x.mean() )], 
                axis=1, keys=('', 'skew', 'mean', 'coef_var'))

then you can work on columns name然后你可以处理列名

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM