[英]Pandas- Find average based on multiple column value
Goal is to get average(integer) of marks
column based on name
value.目标是根据
name
值获取marks
列的平均值(整数)。 If id
and name
column appears with exact same value more than once, then the marks
with corresponding name
will be considered once.如果
id
和name
列多次出现完全相同的值,则相应name
的marks
将被视为一次。 For eg average of x
= (33+14+3)/3 = 16
例如
x
= (33+14+3)/3 = 16
的平均值
Sample dataframe:样品 dataframe:
id name marks
0 1 x 33
1 1 x 33
2 2 y 9
3 3 x 14
4 4 y 55
5 4 y 55
6 5 x 3
7 6 z 31
Expected output:预期 output:
id name marks avg
0 1 x 33 16
1 1 x 33 16
2 2 y 9 32
3 3 x 14 16
4 4 y 55 32
5 4 y 55 32
6 5 x 3 16
7 6 z 31 31
I tried:我试过了:
df["avg"] = df.groupby("name")["marks"].mean()
Compute mean for each name
after drop duplicates (id, name)
and map result value on name
column:在
name
列上删除重复项(id, name)
和 map 结果值后计算每个name
的平均值:
df['avg'] = df['name'].map(df.drop_duplicates(['id', 'name']).groupby('name')['marks'].mean())
print(df)
# Output:
id name marks avg
0 1 x 33 16.666667
1 1 x 33 16.666667
2 2 y 9 32.000000
3 3 x 14 16.666667
4 4 y 55 32.000000
5 4 y 55 32.000000
6 5 x 3 16.666667
7 6 z 31 31.000000
Try this:尝试这个:
df = df.set_index('name').assign(avg=df[~df.set_index(['name', 'marks']).index.duplicated()].groupby('name')['marks'].mean()).reset_index()
Output: Output:
>>> df
name id marks avg
0 x 1 33 16.666667
1 x 1 33 16.666667
2 y 2 9 32.000000
3 x 3 14 16.666667
4 y 4 55 32.000000
5 y 4 55 32.000000
6 x 5 3 16.666667
7 z 6 31 31.000000
If you need it rounded, chain .astype(int)
to .mean()
:如果您需要四舍五入,
.astype(int)
链接到 .mean .mean()
:
df = df.set_index('name').assign(avg=df[~df.set_index(['name', 'marks']).index.duplicated()].groupby('name')['marks'].mean().astype(int)).reset_index()
Output: Output:
>>> df
name id marks avg
0 x 1 33 16
1 x 1 33 16
2 y 2 9 32
3 x 3 14 16
4 y 4 55 32
5 y 4 55 32
6 x 5 3 16
7 z 6 31 31
One option, which uses the same drop_duplicates idea, without using a groupby, is to pivot the deduplicated data:一个选项,它使用相同的 drop_duplicates 想法,而不使用 groupby,是 pivot 重复数据删除:
df.assign(avg = df.name.map(df.drop_duplicates().pivot('name', 'id', 'marks').mean(1)))
id name marks avg
0 1 x 33 16.666667
1 1 x 33 16.666667
2 2 y 9 32.000000
3 3 x 14 16.666667
4 4 y 55 32.000000
5 4 y 55 32.000000
6 5 x 3 16.666667
7 6 z 31 31.000000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.