简体   繁体   中英

With pandas, how to create a table of average ocurrences, using more than one column?

Say I have a df that looks like this:

    name  day  var_A   var_B
0   Pete  Wed      4       5
1   Luck  Thu      1      10
2   Pete  Sun     10      10

And I want to sum var_A and var_B for every name/person and then get the average of this sum by the number of ocurrences of that name/person.

Let's take Pete for example. Sum his variables (in this case, (4+10) + (5+10) = 29), and divide this sum by the ocurrences of Pete in the df (29/2 = 14,5). And the "day" column would be eliminated, there would be only one column for the name and another for the average. Would look like this:

>>> df.method().method()

        name   avg
    0   Pete  14.5
    1   Luck  11.0

I've been trying to do this using groupby and other methods, but I eventually got stuck. Any help would be appreciated.

I came up with

df.groupby('name')['var_A', 'var_B'].apply(lambda g: g.stack().sum()/len(g)).rename('avg').reset_index()

which produces the correct result, but I'm not sure it's the most elegant way.

pandas' groupby is a lazy expression, and as such it is reusable:


# create group
group = df.drop(columns="day").groupby("name")

# compute output
 group.sum().sum(1) / group.size()
 
name
Luck    11.0
Pete    14.5
dtype: float64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM