Pandas- Find average based on multiple column value

Question

Goal is to get average(integer) of marks column based on name value. If id and name column appears with exact same value more than once, then the marks with corresponding name will be considered once. For eg average of x = (33+14+3)/3 = 16

Sample dataframe:

   id name  marks
0   1   x   33
1   1   x   33
2   2   y   9
3   3   x   14
4   4   y   55
5   4   y   55
6   5   x   3
7   6   z   31

Expected output:

   id name marks avg
0   1   x   33  16
1   1   x   33  16
2   2   y   9   32
3   3   x   14  16
4   4   y   55  32
5   4   y   55  32
6   5   x   3   16
7   6   z   31  31

I tried:

df["avg"] = df.groupby("name")["marks"].mean()

Answer 1

Compute mean for each name after drop duplicates (id, name) and map result value on name column:

df['avg'] = df['name'].map(df.drop_duplicates(['id', 'name']).groupby('name')['marks'].mean())
print(df)

# Output:
   id name  marks        avg
0   1    x     33  16.666667
1   1    x     33  16.666667
2   2    y      9  32.000000
3   3    x     14  16.666667
4   4    y     55  32.000000
5   4    y     55  32.000000
6   5    x      3  16.666667
7   6    z     31  31.000000

Answer 2

Try this:

df = df.set_index('name').assign(avg=df[~df.set_index(['name', 'marks']).index.duplicated()].groupby('name')['marks'].mean()).reset_index()

Output:

>>> df
  name  id  marks        avg
0    x   1     33  16.666667
1    x   1     33  16.666667
2    y   2      9  32.000000
3    x   3     14  16.666667
4    y   4     55  32.000000
5    y   4     55  32.000000
6    x   5      3  16.666667
7    z   6     31  31.000000

If you need it rounded, chain .astype(int) to .mean() :

df = df.set_index('name').assign(avg=df[~df.set_index(['name', 'marks']).index.duplicated()].groupby('name')['marks'].mean().astype(int)).reset_index()

Output:

>>> df
  name  id  marks  avg
0    x   1     33   16
1    x   1     33   16
2    y   2      9   32
3    x   3     14   16
4    y   4     55   32
5    y   4     55   32
6    x   5      3   16
7    z   6     31   31

Answer 3

One option, which uses the same drop_duplicates idea, without using a groupby, is to pivot the deduplicated data:

df.assign(avg = df.name.map(df.drop_duplicates().pivot('name', 'id', 'marks').mean(1)))
 
   id name  marks        avg
0   1    x     33  16.666667
1   1    x     33  16.666667
2   2    y      9  32.000000
3   3    x     14  16.666667
4   4    y     55  32.000000
5   4    y     55  32.000000
6   5    x      3  16.666667
7   6    z     31  31.000000

Pandas- Find average based on multiple column value

Question

3 answers

solution1
4 2021-12-08 19:48:03

solution2
0 2021-12-08 19:32:36

solution3
0 2021-12-08 20:43:14

Pandas- Find average based on multiple column value

Question

3 answers

solution1 4 2021-12-08 19:48:03

solution2 0 2021-12-08 19:32:36

solution3 0 2021-12-08 20:43:14

solution1
4 2021-12-08 19:48:03

solution2
0 2021-12-08 19:32:36

solution3
0 2021-12-08 20:43:14