计算 pandas dataframe 中的平均值和标准差

Question

I have the following dataframe:我有以下 dataframe：

    COD     CHM     DATE
0   5713    0.0     2020-07-16
1   5713    1.0     2020-08-11
2   5713    2.0     2020-06-20
3   5713    3.0     2020-06-19
4   5713    4.0     2020-06-01
... ... ... ...
2135283 73306036    0.0     2020-09-30
2135284 73306055    12.0    2020-09-30
2135285 73306479    9.0     2020-09-30
2135286 73306656    3.0     2020-09-30
2135287 73306676    1.0     2020-09-30

I want to calculate the mean and the standard deviation for each COD throughout the dates (time).我想计算整个日期（时间）内每个 COD 的平均值和标准偏差。 For this, I am doing:为此，我正在做：

    traf_user_chm_med =traf_user_chm_med.groupby(['COD', 'DATE'])['CHM'].sum().reset_index()
dates = pd.date_range(start=traf_user_chm_med.DATE.min(), end=traf_user_chm_med.DATE.max(), freq='MS', closed='left').sort_values(ascending=False)
clients = traf_user_chm_med['COD'].unique()
idx = pd.MultiIndex.from_product((clients, dates), names=['COD', 'DATE'])
M0 = pd.to_datetime('2020-08')
M1 = M0-pd.DateOffset(month=M0.month-1)
M2 = M0-pd.DateOffset(month=M0.month-2)
M3 = M0-pd.DateOffset(month=M0.month-3)
M4 = M0-pd.DateOffset(month=M0.month-4)
M5 = M0-pd.DateOffset(month=M0.month-5)
def filter_dates(grp):
    grp.set_index('YEAR_MONTH', inplace=True)
    grp=grp[M0:M5].reset_index()
    return grp
traf_user_chm_med = traf_user_chm_med.groupby('COD').apply(filter_dates)

Not sure why it doesn't work, it returns an empty dataframe.不知道为什么它不起作用，它返回一个空的 dataframe。 After this I would unstack to get the activity in the several months and calculate the mean and standard deviation for each COD.在此之后，我将解开堆叠以获得几个月内的活动并计算每个 COD 的平均值和标准偏差。 This is a long proccess, not sure if there is a faster way to do it that gets me the values I want.这是一个漫长的过程，不确定是否有更快的方法来获得我想要的值。 Still, if anyone can help me get this one working would be aweosome!不过，如果有人能帮助我让这个工作起来，那就太棒了！

Answer 1

If I understand correctly, you're simply requiring this:如果我理解正确，您只需要这样做：

df.groupby("COD")["CHM"].agg("std")

As a general principle, there's almost always a "pythonic" way to do these things that's fewer lines and easy to understand!作为一般原则，几乎总是有一种“pythonic”的方式来做这些事情，它的行数更少且易于理解！

Answer 2

df['mean'] = df.groupby('DATE')['COD'].transform('mean')

Answer 3

You can use transform to broadcast your mean and std您可以使用transform来广播您的均值和标准

...
df['mean'] = df.groupby('DATE')['COD'].transform('mean')
df['std'] = df.groupby('DATE')['COD'].transform('std')

计算 pandas dataframe 中的平均值和标准差

问题描述

3 个解决方案

解决方案1
0 2021-03-04 17:29:14

解决方案2
0 已采纳 2021-03-04 19:03:02

解决方案3
0 2021-03-04 19:10:12

计算 pandas dataframe 中的平均值和标准差

问题描述

3 个解决方案

解决方案1 0 2021-03-04 17:29:14

解决方案2 0 已采纳 2021-03-04 19:03:02

解决方案3 0 2021-03-04 19:10:12

解决方案1
0 2021-03-04 17:29:14

解决方案2
0 已采纳 2021-03-04 19:03:02

解决方案3
0 2021-03-04 19:10:12