[英]Using multiple functions with pandas transform
I have a dataset that looks like this:我有一个如下所示的数据集:
entity_id transaction_date transaction_month net_flow inflow outflow
0 51 2018-07-02 2018-07-01 10161.06 20161.06 10000.00
1 51 2018-07-03 2018-07-01 5823.73 5867.37 43.64
2 51 2018-07-05 2018-07-01 17835.79 24107.29 6271.50
3 51 2018-07-06 2018-07-01 -3544.72 31782.84 35327.56
4 51 2018-07-09 2018-07-01 18252.42 18332.42 80.00
I am trying to calculate the rolling metrics across the entity_id
field using rolling
and transform
.我正在尝试使用
rolling
和transform
计算entity_id
字段的滚动指标。 I have multiple variables I'd like to create and would prefer to run them in a single call.我有多个要创建的变量,并且希望在一次调用中运行它们。
For example, if I were to create these measures using agg
, I would execute something like this:例如,如果我要使用
agg
创建这些度量,我会执行如下操作:
transactions = (
raw_transactions
.groupby(['entity_id','transaction_month'])[['inflow','outflow']]
.agg([
'sum','skew',
( 'coef_var', lambda x: x.std() / x.mean() ),
( 'kurtosis', lambda x: x.kurtosis() )
])
.reset_index()
)
However, I'm unable to reproduce this using transform
.但是,我无法使用
transform
重现这一点。 When I try to pass functions using either a dict or list, I get a TypeError due to list or dict being unhashable.当我尝试使用 dict 或 list 传递函数时,由于 list 或 dict 不可散列,我得到一个 TypeError。
>>> transactions.groupby(['entity_id'])[['inflow','outflow']].transform(['skew','mean'])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-62-4ef49d836b3f> in <module>
----> 1 transactions.groupby(['entity_id'])[['inflow','outflow']].transform(['skew','mean'])
/jupyter/packages/pandas/core/groupby/generic.py in transform(self, func, engine, engine_kwargs, *args, **kwargs)
1354
1355 # optimized transforms
-> 1356 func = self._get_cython_func(func) or func
1357
1358 if not isinstance(func, str):
/jupyter/packages/pandas/core/base.py in _get_cython_func(self, arg)
335 if we define an internal function for this argument, return it
336 """
--> 337 return self._cython_table.get(arg)
338
339 def _is_builtin_func(self, arg):
TypeError: unhashable type: 'list'
I don't think it is possible with transform
.我认为
transform
不可能。 You have two workarounds (at least).您有两种解决方法(至少)。 Either
merge
the result of groupby.agg
on the original dataframe:在原始
groupby.agg
上merge
groupby.agg 的结果:
tmp_ = (
raw_transactions
.groupby(['entity_id','transaction_month'])[['inflow','outflow']]
.agg([
'sum','skew',
( 'coef_var', lambda x: x.std() / x.mean() ),
( 'kurtosis', lambda x: x.kurtosis() )
]) #no reset_index here
)
# need to flatten multiindex columns
tmp_.columns = ['_'.join(cols) for cols in tmp_.columns]
# then merge with original dataframe
res = raw_transactions.merge(tmp_, on=['entity_id','transaction_month'])
or use a list comprehension over the different function to transform in a concat
with the original data或对不同的
concat
使用列表理解来转换原始数据
# group once
gr = raw_transactions.groupby(['entity_id'])[['inflow','outflow']]
#concat each dataframe of transformed function with otiginal data
res = pd.concat([raw_transactions] +
[gr.transform(func)
for func in ('skew', 'mean', lambda x: x.std() / x.mean() )],
axis=1, keys=('', 'skew', 'mean', 'coef_var'))
then you can work on columns name然后你可以处理列名
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.