简体   繁体   中英

How to create a dataframe of summary statistics?

I have a dataframe with IDs and numerous test results relating to each ID. What I want to do is create a second dataframe which summarises the average score and the standard deviation for a particular test, which I can then plot on a graph.

Below is the code I have so far. It returns an error of "ValueError: Length mismatch: Expected axis has 1 elements, new values have 2 elements".

Can anyone help?


    df2 = df1.groupby(['id'], as_index=True).agg({'variable_1':['mean'], 'variable_1':['std']})
    df2.columns=['var_mean','var_std']
    df2.plot(x='var_mean', y='var_std', kind='scatter', figsize=(15,10), title='Standard Deviation of Std vs Mean')


example data:

ID    Variable_1
1234  32
1234  23
2345  54
2345  65
2345  76
3456  78

what I'd like:

ID    Mean  SD
1234  23.5  2.2
2345  45    9
...
...

You can pass a dict of functions to a groupby to perform the stats using agg :

In [154]:

df.groupby('ID')['Variable_1'].agg({'Mean':np.mean, 'SD':np.std})
Out[154]:
      Mean         SD
ID                   
1234  27.5   6.363961
2345  65.0  11.000000
3456  78.0        NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM