简体   繁体   English

在满足条件的SeriesGroupBy对象上使用Apply

[英]Use Apply on a SeriesGroupBy Object where conditions are met

I have a DataFrame df1 : 我有一个DataFrame df1

 df1.head() = 

           id      ret     eff
    1469  2300 -0.010879  4480.0
    328   2300 -0.000692 -4074.0
    1376  2300 -0.009551  4350.0
    2110  2300 -0.014013  5335.0
    849   2300 -0.286490 -9460.0

I would like to create a new column that contains the normalized values of the column df1['eff'] . 我想创建一个新列,其中包含列df1['eff']的规范化值。
In other words, I would like to group df1['eff'] by df1['id'] , look for the max value ( mx = df1['eff'].max() ) and the min value ( mn = df2['eff'].min() ), and divide in a pairwise fashion each value of the column df1['eff'] by mn or mx depending if df1['eff'] > 0 or df1['eff']< 0 . 换句话说,我想将df1['eff']df1['id']分组,寻找最大值( mx = df1['eff'].max() )和最小值( mn = df2['eff'].min() ),并以成对方式将df1['eff']列的每个值除以mnmx具体取决于df1['eff'] > 0还是df1['eff']< 0

The code that I have written is the following: 我编写的代码如下:

df1['normd'] = df1.groupby('id')['eff'].apply(lambda x: x/x.max() if x > 0 else x/x.min())

However python throws the following error: 但是python抛出以下错误:

*** ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(),
 a.item(), a.any() or a.all().

Since df1.groupby('id')['eff'] is a SeriesGroupBy Object , i decided to use map() . 由于df1.groupby('id')['eff']SeriesGroupBy Object ,因此我决定使用map() But again python throws the following error: 但是python再次抛出以下错误:

 *** AttributeError: Cannot access callable attribute 'map' of 'SeriesGroupBy' ob
 jects, try using the 'apply' method

Many thanks in advance. 提前谢谢了。

You can use custom function f , where is possible easy add print . 您可以使用自定义功能f ,可以在其中轻松添加print So x is Series and you need compare each group by numpy.where . 所以xSeries ,您需要通过numpy.where比较每个组。 Output is numpy array and you need convert it to Series : 输出是numpy array ,您需要将其转换为Series

def f(x):
    #print (x)
    #print (x/x.max())
    #print (x/x.min())
    return pd.Series(np.where(x>0, x/x.max(), x/x.min()), index=x.index)


df1['normd'] = df1.groupby('id')['eff'].apply(f)
print (df1)
        id       ret     eff     normd
1469  2300 -0.010879  4480.0  0.839738
328   2300 -0.000692 -4074.0  0.430655
1376  2300 -0.009551  4350.0  0.815370
2110  2300 -0.014013  5335.0  1.000000
849   2300 -0.286490 -9460.0  1.000000

What is same as: 等同于:

df1['normd'] = df1.groupby('id')['eff']
                  .apply(lambda x: pd.Series(np.where(x>0, 
                                                      x/x.max(), 
                                                      x/x.min()), index=x.index))
print (df1)
        id       ret     eff     normd
1469  2300 -0.010879  4480.0  0.839738
328   2300 -0.000692 -4074.0  0.430655
1376  2300 -0.009551  4350.0  0.815370
2110  2300 -0.014013  5335.0  1.000000
849   2300 -0.286490 -9460.0  1.000000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM