简体   繁体   中英

Getting column mean in groupby clause python pandas

I have a dataset of actors and directors and the popularity of the movie that they have worked together in.

print (actors_director_df.head(3))

                 actor         director  popularity counter
0          Chris Pratt  Colin Trevorrow   32.985763       0
1  Bryce Dallas Howard  Colin Trevorrow   32.985763       0
2          Irrfan Khan  Colin Trevorrow   32.985763       0

I want to group by using actor and director because a pair can work in more than one film. which I successfully did use below query.

actor_director_grouped = actors_director_df.groupby(['actor','director']) \
                         .size() \
                         .reset_index(name='count') \
                         .sort_values(['count'], ascending=False) \
                         .head(10)

print (actor_director_grouped)

                      actor            director  count
3619         Clint Eastwood      Clint Eastwood     14
19272           Woody Allen         Woody Allen     12
9606            Johnny Depp          Tim Burton      8

But the popularity column goes missing in this DF.

What I want to do is to do a mean of popularity column after groupby and show the mean in front of actor and director along with the count of the number of movies they did together.

ie my ideal output would be something like this.

                      actor            director  popularity count
3619         Clint Eastwood      Clint Eastwood   32.985763    14
19272           Woody Allen         Woody Allen   5.1231231    12
9606            Johnny Depp          Tim Burton   3.1231231    8

Looking at your dataframe the counter columns seems unnecessary. Let us instead use the popularity column and make one mean and one count column:

import pandas as pd
import numpy as np

np.random.seed(444)

names = [
    'Robert Baratheon',
    'Jon Snow',
    'Daenerys Targaryen',
    'Theon Greyjoy',
    'Tyrion Lannister'
]

df = pd.DataFrame({
    'actor': np.random.choice(names, size=10, p = [0.2,0.2,0.2,0.1,0.3]),
    'director': np.random.choice(names, size=10, p = [0.4,0.1,0.1,0.1,0.3]),
    'popularity': np.random.randint(0,100, size=10),
    'counter': 0
})

df2 = df.groupby(['actor','director'])['popularity']\
        .agg(['count', 'mean'])\
        .reset_index()\
        .sort_values(by='mean', ascending=False)

print(df2)

Returns:

              actor          director  count  mean
0          Jon Snow  Robert Baratheon      2  53.5
5  Tyrion Lannister  Tyrion Lannister      2  49.0
2  Robert Baratheon  Tyrion Lannister      2  48.5
1  Robert Baratheon          Jon Snow      2  40.5
4     Theon Greyjoy  Tyrion Lannister      1  13.0
3     Theon Greyjoy  Robert Baratheon      1   7.0

I took to liberty to add some dummy data that would help understand the groupby clause better.

print(df)

Output:

                   actor           director  popularity  counter
0           Chris Pratt    Colin Trevorrow   32.985763        0
1   Bryce Dallas Howard    Colin Trevorrow   32.985763        0
2           Irrfan Khan    Colin Trevorrow   32.985763        0
3           Irrfan Khan    Colin Trevorrow   60.000000       12
4           Irrfan Khan       John Markson   10.000000       10
5           Irrfan Khan       Mark Johnson  100.000000        4

Then you need to groupby on actor and director and then find mean for popularity and sum for count .

g = df.groupby(['actor', 'director'], as_index=False)
count = g.size().reset_index(name='count')
grp = g.agg({'popularity':'mean'})
grp.merge(count)

Output:

                 actor         director  popularity  count
0  Bryce Dallas Howard  Colin Trevorrow   32.985763      1
1          Chris Pratt  Colin Trevorrow   32.985763      1
2          Irrfan Khan  Colin Trevorrow   46.492881      2
3          Irrfan Khan     John Markson   10.000000      1
4          Irrfan Khan     Mark Johnson  100.000000      1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM