简体   繁体   中英

Compute average from grouped data-frames with different number of rows

I have a list of comma separated grouped dataframes as below with columns id_ and val . The dataframes can have varying number of rows.

df=
[id_  val
2       5
2       15,          
id_  val
2       5
3      25
2       4
3      20,          
id_  val
2      10
3      10]

I want to iterate through each dataframe in the list and pass the two column values to a function computeAverage(df.id_,df.val) , which first multiples the id_ column and val column of each dataframe, then adds up the multipled values and returns the average from each dataframe.

ie, (2*5 + 2*15)/df.shape[0], (2*5 + 3*25 + 2*4 + 3* 20) /df.shape[0], and (2*10 + 3*10)/df.shape[0]

This is what I have tried so far, but it fails to iterate through all rows of each dataframe.

def get_df(df):

    
    for i in range(len(df)):
        id_ = df[i]['id_']
        val = df[i]['val']

def computeAverage(df.id_,df.val):

    sum_ = 0
    multiply = df.id_ * df.val
    sum_ += multiply
    avg = sum_ / df[i].shape[0]
    return avg

Define the function:

In [25]: def computeAverage(df, Id, val):
    ...:     result = df[Id].mul(df[val])
    ...:     result = result.mean()
    ...:     return result

Run a list comprehension, combining it with the pipe function:

In [28]: [frame.pipe(computeAverage, 'id_', 'val') for frame in df]
Out[28]: [20.0, 38.25, 25.0]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM