Pandas-根据多列值查找平均值

Question

Goal is to get average(integer) of marks column based on name value.目标是根据name值获取marks列的平均值（整数）。 If id and name column appears with exact same value more than once, then the marks with corresponding name will be considered once.如果id和name列多次出现完全相同的值，则相应name的marks将被视为一次。 For eg average of x = (33+14+3)/3 = 16例如x = (33+14+3)/3 = 16的平均值

Sample dataframe:样品 dataframe：

   id name  marks
0   1   x   33
1   1   x   33
2   2   y   9
3   3   x   14
4   4   y   55
5   4   y   55
6   5   x   3
7   6   z   31

Expected output:预期 output：

   id name marks avg
0   1   x   33  16
1   1   x   33  16
2   2   y   9   32
3   3   x   14  16
4   4   y   55  32
5   4   y   55  32
6   5   x   3   16
7   6   z   31  31

I tried:我试过了：

df["avg"] = df.groupby("name")["marks"].mean()

Answer 1

Compute mean for each name after drop duplicates (id, name) and map result value on name column:在name列上删除重复项(id, name)和 map 结果值后计算每个name的平均值：

df['avg'] = df['name'].map(df.drop_duplicates(['id', 'name']).groupby('name')['marks'].mean())
print(df)

# Output:
   id name  marks        avg
0   1    x     33  16.666667
1   1    x     33  16.666667
2   2    y      9  32.000000
3   3    x     14  16.666667
4   4    y     55  32.000000
5   4    y     55  32.000000
6   5    x      3  16.666667
7   6    z     31  31.000000

Answer 2

Try this:尝试这个：

df = df.set_index('name').assign(avg=df[~df.set_index(['name', 'marks']).index.duplicated()].groupby('name')['marks'].mean()).reset_index()

Output: Output：

>>> df
  name  id  marks        avg
0    x   1     33  16.666667
1    x   1     33  16.666667
2    y   2      9  32.000000
3    x   3     14  16.666667
4    y   4     55  32.000000
5    y   4     55  32.000000
6    x   5      3  16.666667
7    z   6     31  31.000000

If you need it rounded, chain .astype(int) to .mean() :如果您需要四舍五入， .astype(int)链接到 .mean .mean() ：

df = df.set_index('name').assign(avg=df[~df.set_index(['name', 'marks']).index.duplicated()].groupby('name')['marks'].mean().astype(int)).reset_index()

Output: Output：

>>> df
  name  id  marks  avg
0    x   1     33   16
1    x   1     33   16
2    y   2      9   32
3    x   3     14   16
4    y   4     55   32
5    y   4     55   32
6    x   5      3   16
7    z   6     31   31

Answer 3

One option, which uses the same drop_duplicates idea, without using a groupby, is to pivot the deduplicated data:一个选项，它使用相同的 drop_duplicates 想法，而不使用 groupby，是 pivot 重复数据删除：

df.assign(avg = df.name.map(df.drop_duplicates().pivot('name', 'id', 'marks').mean(1)))
 
   id name  marks        avg
0   1    x     33  16.666667
1   1    x     33  16.666667
2   2    y      9  32.000000
3   3    x     14  16.666667
4   4    y     55  32.000000
5   4    y     55  32.000000
6   5    x      3  16.666667
7   6    z     31  31.000000

Pandas-根据多列值查找平均值

问题描述

3 个解决方案

解决方案1
4 2021-12-08 19:48:03

解决方案2
0 2021-12-08 19:32:36

解决方案3
0 2021-12-08 20:43:14

Pandas-根据多列值查找平均值

问题描述

3 个解决方案

解决方案1 4 2021-12-08 19:48:03

解决方案2 0 2021-12-08 19:32:36

解决方案3 0 2021-12-08 20:43:14

解决方案1
4 2021-12-08 19:48:03

解决方案2
0 2021-12-08 19:32:36

解决方案3
0 2021-12-08 20:43:14