简体   繁体   English

Pandas-根据多列值查找平均值

[英]Pandas- Find average based on multiple column value

Goal is to get average(integer) of marks column based on name value.目标是根据name值获取marks列的平均值(整数)。 If id and name column appears with exact same value more than once, then the marks with corresponding name will be considered once.如果idname列多次出现完全相同的值,则相应namemarks将被视为一次。 For eg average of x = (33+14+3)/3 = 16例如x = (33+14+3)/3 = 16的平均值

Sample dataframe:样品 dataframe:

   id name  marks
0   1   x   33
1   1   x   33
2   2   y   9
3   3   x   14
4   4   y   55
5   4   y   55
6   5   x   3
7   6   z   31

Expected output:预期 output:

   id name marks avg
0   1   x   33  16
1   1   x   33  16
2   2   y   9   32
3   3   x   14  16
4   4   y   55  32
5   4   y   55  32
6   5   x   3   16
7   6   z   31  31

I tried:我试过了:

df["avg"] = df.groupby("name")["marks"].mean()

Compute mean for each name after drop duplicates (id, name) and map result value on name column:name列上删除重复项(id, name)和 map 结果值后计算每个name的平均值:

df['avg'] = df['name'].map(df.drop_duplicates(['id', 'name']).groupby('name')['marks'].mean())
print(df)

# Output:
   id name  marks        avg
0   1    x     33  16.666667
1   1    x     33  16.666667
2   2    y      9  32.000000
3   3    x     14  16.666667
4   4    y     55  32.000000
5   4    y     55  32.000000
6   5    x      3  16.666667
7   6    z     31  31.000000

Try this:尝试这个:

df = df.set_index('name').assign(avg=df[~df.set_index(['name', 'marks']).index.duplicated()].groupby('name')['marks'].mean()).reset_index()

Output: Output:

>>> df
  name  id  marks        avg
0    x   1     33  16.666667
1    x   1     33  16.666667
2    y   2      9  32.000000
3    x   3     14  16.666667
4    y   4     55  32.000000
5    y   4     55  32.000000
6    x   5      3  16.666667
7    z   6     31  31.000000

If you need it rounded, chain .astype(int) to .mean() :如果您需要四舍五入, .astype(int)链接到 .mean .mean()

df = df.set_index('name').assign(avg=df[~df.set_index(['name', 'marks']).index.duplicated()].groupby('name')['marks'].mean().astype(int)).reset_index()

Output: Output:

>>> df
  name  id  marks  avg
0    x   1     33   16
1    x   1     33   16
2    y   2      9   32
3    x   3     14   16
4    y   4     55   32
5    y   4     55   32
6    x   5      3   16
7    z   6     31   31

One option, which uses the same drop_duplicates idea, without using a groupby, is to pivot the deduplicated data:一个选项,它使用相同的 drop_duplicates 想法,而不使用 groupby,是 pivot 重复数据删除:

df.assign(avg = df.name.map(df.drop_duplicates().pivot('name', 'id', 'marks').mean(1)))
 
   id name  marks        avg
0   1    x     33  16.666667
1   1    x     33  16.666667
2   2    y      9  32.000000
3   3    x     14  16.666667
4   4    y     55  32.000000
5   4    y     55  32.000000
6   5    x      3  16.666667
7   6    z     31  31.000000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM