[英]Use Apply on a SeriesGroupBy Object where conditions are met
I have a DataFrame df1
: 我有一个DataFrame
df1
:
df1.head() =
id ret eff
1469 2300 -0.010879 4480.0
328 2300 -0.000692 -4074.0
1376 2300 -0.009551 4350.0
2110 2300 -0.014013 5335.0
849 2300 -0.286490 -9460.0
I would like to create a new column that contains the normalized values of the column df1['eff']
. 我想创建一个新列,其中包含列
df1['eff']
的规范化值。
In other words, I would like to group df1['eff']
by df1['id']
, look for the max value ( mx = df1['eff'].max()
) and the min value ( mn = df2['eff'].min()
), and divide in a pairwise fashion each value of the column df1['eff']
by mn
or mx
depending if df1['eff'] > 0
or df1['eff']< 0
. 换句话说,我想将
df1['eff']
与df1['id']
分组,寻找最大值( mx = df1['eff'].max()
)和最小值( mn = df2['eff'].min()
),并以成对方式将df1['eff']
列的每个值除以mn
或mx
具体取决于df1['eff'] > 0
还是df1['eff']< 0
。
The code that I have written is the following: 我编写的代码如下:
df1['normd'] = df1.groupby('id')['eff'].apply(lambda x: x/x.max() if x > 0 else x/x.min())
However python throws the following error: 但是python抛出以下错误:
*** ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(),
a.item(), a.any() or a.all().
Since df1.groupby('id')['eff']
is a SeriesGroupBy Object
, i decided to use map()
. 由于
df1.groupby('id')['eff']
是SeriesGroupBy Object
,因此我决定使用map()
。 But again python throws the following error: 但是python再次抛出以下错误:
*** AttributeError: Cannot access callable attribute 'map' of 'SeriesGroupBy' ob
jects, try using the 'apply' method
Many thanks in advance. 提前谢谢了。
You can use custom function f
, where is possible easy add print
. 您可以使用自定义功能
f
,可以在其中轻松添加print
。 So x
is Series
and you need compare each group by numpy.where
. 所以
x
是Series
,您需要通过numpy.where
比较每个组。 Output is numpy array
and you need convert it to Series
: 输出是
numpy array
,您需要将其转换为Series
:
def f(x):
#print (x)
#print (x/x.max())
#print (x/x.min())
return pd.Series(np.where(x>0, x/x.max(), x/x.min()), index=x.index)
df1['normd'] = df1.groupby('id')['eff'].apply(f)
print (df1)
id ret eff normd
1469 2300 -0.010879 4480.0 0.839738
328 2300 -0.000692 -4074.0 0.430655
1376 2300 -0.009551 4350.0 0.815370
2110 2300 -0.014013 5335.0 1.000000
849 2300 -0.286490 -9460.0 1.000000
What is same as: 等同于:
df1['normd'] = df1.groupby('id')['eff']
.apply(lambda x: pd.Series(np.where(x>0,
x/x.max(),
x/x.min()), index=x.index))
print (df1)
id ret eff normd
1469 2300 -0.010879 4480.0 0.839738
328 2300 -0.000692 -4074.0 0.430655
1376 2300 -0.009551 4350.0 0.815370
2110 2300 -0.014013 5335.0 1.000000
849 2300 -0.286490 -9460.0 1.000000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.