DataFrame: Group by one column and average other columns

Question

Say I have the following DataFrame:

data = pd.DataFrame({'id' : ['1','2','3','4','5'], 'group' : ['1','1','2','1','2'], 
      'state' : ['True','False','False','True','True'], 'value' : [11,12,5,8,3]})

I would like to create a new DataFrame, keeping 3 columns: groups ( '1' or '2' ), and averaging over the columns 'state' and 'value' , hence the DataFrame would be:

grouped_averaged = pd.DataFrame({'group' : ['1','2'], 'average_state' : [0.66,0.5], 'value' : [7,3]})

Answer 1

You just need groupby :

data['state'] = data['state'].eq('True')
data.drop('id',axis=1).groupby('group', as_index=False).mean()

Output:

  group     state      value
0     1  0.666667  10.333333
1     2  0.500000   4.000000

Answer 2

data.groupby('group').agg({('average_state', 'mean')})

Answer 3

You should first create a filtered dataframe that filters your required dataframe. The algorithm would be to first create a list of values that you want to filter with then you would change the value of True and False to 1 and 0 in state and then group them with an aggregate function.

df = pd.DataFrame({'id' : ['1','2','3','4','5'], 'group' : ['1','1','2','1','2'], 
      'state' : ['True','False','False','True','True'], 'value' : [11,12,5,8,3]})
filter_values=['1','2']
df=df.loc[df['group'].isin(filter_values)]
df['state']=(df['state']=="True").astype(int)
df['state']=(df['state']=="False").astype(int)
aggregate_functions={'state':'mean','value':'mean'}
clean_df=df.groupby(['group']).aggregate(aggregate_functions)

I haven't ran it on my pc but you can test it but this algorithm should work.

DataFrame: Group by one column and average other columns

Question

3 answers

solution1
2 ACCPTED 2020-09-23 14:02:20

solution2
1 2020-09-23 14:05:39

solution3
0 2020-09-23 14:12:41

DataFrame: Group by one column and average other columns

Question

3 answers

solution1 2 ACCPTED 2020-09-23 14:02:20

solution2 1 2020-09-23 14:05:39

solution3 0 2020-09-23 14:12:41

solution1
2 ACCPTED 2020-09-23 14:02:20

solution2
1 2020-09-23 14:05:39

solution3
0 2020-09-23 14:12:41