简体   繁体   中英

pandas custom aggregation function

I have a pandas dataframe, which the following command works on:

house.groupby(['place_name'])['index_nsa'].agg(['first','last'])

It gives me what I want. Now I want to make a custom aggregation value that gives me the percentage change between the first and the last value.

I got an error for doing math on the values, so I assumed that I have to turn them into numbers.

house.groupby(['place_name'])['index_nsa'].agg({"change in %":[(int('last')-int('first')/int('first')]})

Unfortunately, I only get a syntax error on the last bracket, which I cannot seem to find the error.

Does someone see where I went wrong ?

You will need to define and pass a callback to agg here. You can do that in-line with a lambda function:

house.groupby(['place_name'])['index_nsa'].agg([
    ("change in %", lambda x: (x.iloc[-1] - x.iloc[0]) / x.iloc[0])])

Look closely at .agg call—to allow renaming the output column, you must pass a list of tuples of the format [(new_name, agg_func), ...] . More info here .

If you want to avoid the lambda at the cost of some verbosity, you may use

def first_last_pct(ser):
    first, last = ser.iloc[0], ser.iloc[-1]
    return (last - first) / first

house.groupby(['place_name'])['index_nsa'].agg([("change in %", first_last_pct)])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM