简体   繁体   中英

Aggregating data in Python using more than one measure from a formula

Suppose I have a series of data that I want to aggregate by Cat

Cat   Volume   Result

A      45      4
A      57      3
B      56      3
C      45      1
C      55      2

I would like to aggregate variance, Skewness and Kurtosis of volume and maximum of Result by cat. I know how to do it one by one by calculating the variance, skewness and Kurtosis of volume but I would like to it neatly with something like this

def f(row):
    row['ResultM']=row['Result'].max()
    row['Variance'] = pd.DataFrame(scipy.stats.moment(row['Volume'], moment=[2,3,4]))
return 

TestData=OrgData.groupby('Id').apply(f)

But it does not work . Can anyone offer suggestions how I can correct my code? Thanks

Edit

def f(x):
    df = pd.DataFrame(scipy.stats.moment(x.Volume.astype(int),moment=[2,3,4]),index=['var','skew','kurtosis']).T
    df['result_max'] = x.Result.astype(int).max()
    return df

df.groupby('Cat').apply(f)

Let's try this:

from scipy import stats

OrgData.groupby('Cat').agg({'Result':'max','Volume':[stats.skew,np.var,stats.kurtosis]})

Output:

    Result Volume               
       max   skew   var kurtosis
Cat                             
A        4      0  72.0       -2
B        3      0   NaN       -3
C        2      0  50.0       -2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM