[英]Pandas dataframe: Group by two columns and then average over another column
Assuming that I have a dataframe with the following values: 假设我有一个具有以下值的数据框:
df:
col1 col2 value
1 2 3
1 2 1
2 3 1
I want to first groupby my dataframe based on the first two columns (col1 and col2) and then average over values of the thirs column (value). 我想首先根据前两列(col1和col2)对数据框进行分组,然后对第三列的值(值)进行平均。 So the desired output would look like this: 因此,所需的输出将如下所示:
col1 col2 avg-value
1 2 2
2 3 1
I am using the following code: 我正在使用以下代码:
columns = ['col1','col2','avg']
df = pd.DataFrame(columns=columns)
df.loc[0] = [1,2,3]
df.loc[1] = [1,3,3]
print(df[['col1','col2','avg']].groupby('col1','col2').mean())
which gets the following error: 出现以下错误:
ValueError: No axis named col2 for object type <class 'pandas.core.frame.DataFrame'>
Any help would be much appreciated. 任何帮助将非常感激。
You need to pass a list of the columns to groupby, what you passed was interpreted as the axis
param which is why it raised an error: 您需要将列的列表传递给groupby,您传递的内容被解释axis
参数,这就是它引发错误的原因:
In [30]:
columns = ['col1','col2','avg']
df = pd.DataFrame(columns=columns)
df.loc[0] = [1,2,3]
df.loc[1] = [1,3,3]
print(df[['col1','col2','avg']].groupby(['col1','col2']).mean())
avg
col1 col2
1 2 3
3 3
If you want to group by multiple columns, you should put them in a list: 如果要按多列分组,则应将它们放在列表中:
columns = ['col1','col2','value']
df = pd.DataFrame(columns=columns)
df.loc[0] = [1,2,3]
df.loc[1] = [1,3,3]
df.loc[2] = [2,3,1]
print(df.groupby(['col1','col2']).mean())
Or slightly more verbose, for the sake of getting the word 'avg' in your aggregated dataframe: 或稍微冗长一些,以便在聚合数据框中使用单词“ avg”:
import numpy as np
columns = ['col1','col2','value']
df = pd.DataFrame(columns=columns)
df.loc[0] = [1,2,3]
df.loc[1] = [1,3,3]
df.loc[2] = [2,3,1]
print(df.groupby(['col1','col2']).agg({'value': {'avg': np.mean}}))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.