使用 pandas 变换的多个函数

Question

I have a dataset that looks like this:我有一个如下所示的数据集：

   entity_id transaction_date transaction_month  net_flow    inflow   outflow
0         51       2018-07-02        2018-07-01  10161.06  20161.06  10000.00
1         51       2018-07-03        2018-07-01   5823.73   5867.37     43.64
2         51       2018-07-05        2018-07-01  17835.79  24107.29   6271.50
3         51       2018-07-06        2018-07-01  -3544.72  31782.84  35327.56
4         51       2018-07-09        2018-07-01  18252.42  18332.42     80.00

I am trying to calculate the rolling metrics across the entity_id field using rolling and transform .我正在尝试使用rolling和transform计算entity_id字段的滚动指标。 I have multiple variables I'd like to create and would prefer to run them in a single call.我有多个要创建的变量，并且希望在一次调用中运行它们。

For example, if I were to create these measures using agg , I would execute something like this:例如，如果我要使用agg创建这些度量，我会执行如下操作：

transactions = (
    raw_transactions
    .groupby(['entity_id','transaction_month'])[['inflow','outflow']]
    .agg([
        'sum','skew',
        ( 'coef_var', lambda x: x.std() / x.mean() ),
        ( 'kurtosis', lambda x: x.kurtosis() )
        ])
    .reset_index()
)

However, I'm unable to reproduce this using transform .但是，我无法使用transform重现这一点。 When I try to pass functions using either a dict or list, I get a TypeError due to list or dict being unhashable.当我尝试使用 dict 或 list 传递函数时，由于 list 或 dict 不可散列，我得到一个 TypeError。

>>> transactions.groupby(['entity_id'])[['inflow','outflow']].transform(['skew','mean'])

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-62-4ef49d836b3f> in <module>
----> 1 transactions.groupby(['entity_id'])[['inflow','outflow']].transform(['skew','mean'])

/jupyter/packages/pandas/core/groupby/generic.py in transform(self, func, engine, engine_kwargs, *args, **kwargs)
   1354 
   1355         # optimized transforms
-> 1356         func = self._get_cython_func(func) or func
   1357 
   1358         if not isinstance(func, str):

/jupyter/packages/pandas/core/base.py in _get_cython_func(self, arg)
    335         if we define an internal function for this argument, return it
    336         """
--> 337         return self._cython_table.get(arg)
    338 
    339     def _is_builtin_func(self, arg):

TypeError: unhashable type: 'list'

Answer 1

I don't think it is possible with transform .我认为transform不可能。 You have two workarounds (at least).您有两种解决方法（至少）。 Either merge the result of groupby.agg on the original dataframe:在原始groupby.agg上merge groupby.agg 的结果：

tmp_ = (
    raw_transactions
    .groupby(['entity_id','transaction_month'])[['inflow','outflow']]
    .agg([
        'sum','skew',
        ( 'coef_var', lambda x: x.std() / x.mean() ),
        ( 'kurtosis', lambda x: x.kurtosis() )
        ]) #no reset_index here
)
# need to flatten multiindex columns
tmp_.columns = ['_'.join(cols) for cols in tmp_.columns] 

# then merge with original dataframe
res = raw_transactions.merge(tmp_, on=['entity_id','transaction_month'])

or use a list comprehension over the different function to transform in a concat with the original data或对不同的concat使用列表理解来转换原始数据

# group once
gr = raw_transactions.groupby(['entity_id'])[['inflow','outflow']]

#concat each dataframe of transformed function with otiginal data
res = pd.concat([raw_transactions] + 
                [gr.transform(func) 
                 for func in ('skew', 'mean', lambda x: x.std() / x.mean() )], 
                axis=1, keys=('', 'skew', 'mean', 'coef_var'))

then you can work on columns name然后你可以处理列名

使用 pandas 变换的多个函数

问题描述

1 个解决方案

解决方案1
2 2021-04-19 16:22:25

使用 pandas 变换的多个函数

问题描述

1 个解决方案

解决方案1 2 2021-04-19 16:22:25

解决方案1
2 2021-04-19 16:22:25