Calculate mean and standard deviation in a time-series

Question

I have the following dataframe:

    COD     ACT     DATE
0   5713    1.0     2020-07-16
1   5713    1.0     2020-08-11
2   5713    1.0     2020-06-20
3   5713    1.0     2020-06-19
4   5713    1.0     2020-06-01
5   23369   1.0     2020-07-17
6   23369   1.0     2020-08-07
7   23369   1.0     2020-09-02
8   23369   1.0     2020-11-22
9   32012   1.0     2020-06-02
10  32012   1.0     2020-07-26

I want to calculate the mean and standard deviation of each COD on the whole time series. Previously I was calculating like this:

df['MEAN'] = df.groupby("COD")["ACT"].transform("mean")
df['STD'] = df.groupby("COD")["ACT"].transform("std")

But this calculated the mean for the time span of the initial timestamp for ACT and final timestamp for ACT (like 3 ACT within 5 months - not 8 months). ACT is the timestamp for the activity, but the whole timeseries has 8 months. I want to calculate the mean and standard deviation for the whole 8 months. Can anyone help me?

Answer 1

What you're looking for is an apply function on the groupby. Make sure to convert the DATE column to a datetime object.

df.groupby("COD").apply(lambda x: x["ACT"].mean())

Here is a screenshot for more clarity. I also thought it might help to get a month wise sum and mean analysis for every COD .

Calculate mean and standard deviation in a time-series

Question

1 answers

solution1
1 ACCPTED 2021-03-19 14:23:26

Calculate mean and standard deviation in a time-series

Question

1 answers

solution1 1 ACCPTED 2021-03-19 14:23:26

solution1
1 ACCPTED 2021-03-19 14:23:26