简体   繁体   中英

Calculations within pandas aggregate

I am trying to perform a calculation within pandas aggregations. I want the calculations to be included in the aggregations. The code on what I am attempting is below. I am also using the pandas package for the df.

data = data.groupby(['type', 'name']).agg({'values': [np.min, np.max, 100 * sum([('values' > 3200)] / [np.size])]})

The formula I am trying to calculate is below:

100 * sum((values > 3200) / (np.size))

This is where np is the size of the aggregation (the numbers aggregated are limited to numbers > 3200). How to perform calculations like this within the aggregations would be of great help.

Example input data (actual dataset is much larger). The repeat values are due to the aggregation.

type, name, values
apple, blue, 2500
orange, green, 2800
peach, black, 3300
lemon, white, 3500

Desired example output (numbers are not correct due to the fact that I have yet to be able to perform the calculation):

type, name, values, np.min, np.max, calcuation
apple, blue, 2500, 1200, 40000, 2300
orange, green, 2800, 1200, 5000, 2500

Passing df.agg a dictionary is used to specify the name of the output columns, here you're essentially writing an aggregation function which is attempting to use three formulas for one named column, and that column is already in your dataframe so its going to fail.

What you should be doing should look more like:

data = data.groupby(['type', 'name']).agg({'min':np.min, 'max':np.max, 'calculation': calculation})

Where you've rewritten your calculation function as either a lambda or a custom function, depending on how you want to do things.

You need to define the function that acts on the group to give you the percentage of values greater than 3200 and pass this, along with the other function into .agg :

func = lambda series: 100* (series > 3200).mean(); 
data.groupby(['type', 'name']).values.agg({'min': min, 'max': max, 'calculation': func})

The mean of a boolean vector gives the percentage of True values, which is a nicer way of calculating it. Also, you can pass common function names such as min and max in as strings.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM