I'm having a bit of trouble with my pandas data frame. I would like to compare measurements with the previous month's measurements. For this I need an extra column with the standard deviation and the average of the previous month.
I have the following table:
Time Value 1 Value 2 Value 3 Value 4
0 2020-04-01 03:42:51.531 9.189975 6.475000 3.962500 6.100006
1 2020-04-06 05:42:39.778 8.799253 7.300000 3.775000 6.119995
2 2020-04-06 06:45:55.211 8.824507 7.250000 3.600000 6.100006
3 2020-04-06 18:53:15.861 8.132523 6.312500 3.275000 6.100006
4 2020-04-07 05:39:54.373 8.772517 6.887500 3.962500 6.100006
... ... ... ... ... ...
17271 2021-03-31 22:12:32.374 9.012240 7.375000 3.750000 6.179993
17272 2021-03-31 22:43:51.906 9.038265 7.225000 3.800000 6.200012
17273 2021-03-31 23:12:27.061 9.091208 7.137500 3.887500 6.179993
17274 2021-03-31 23:44:14.439 9.109208 7.287500 3.962500 6.199997
17275 2021-04-01 00:00:00.000 9.111931 7.274812 3.973665 6.198373
For each of the four measurements per time step, I want an additional column (so a total of 8 extra columns) with the mean and standard deviation of the previous month. So for example, for each Value 1 measurement in January 2021, I want the Value 1 mean and standard deviation of December 2020.
I've been working on this for several days, but I can't manage to write a working python code for it. I hope someone can help me out. Thanks in advance!
You should be able to use the code below to achieve your objective. It computes the mean and standard deviation for each month and then does a lookup/merge using the Time
column:
from pandas.tseries.offsets import MonthEnd
previous_month = df["Time"].dt.normalize() - MonthEnd(1)
mu = df.groupby(pd.Grouper(key="Time", freq="M")).mean()
sigma = df.groupby(pd.Grouper(key="Time", freq="M")).std()
df = df.merge(
mu,
how="left",
left_on=previous_month,
right_index=True,
suffixes=("", "_prev_mean"),
)
df = df.merge(
sigma,
how="left",
left_on=previous_month,
right_index=True,
suffixes=("", "_prev_std"),
)
If your data looks like this:
Time Value 1 Value 2 Value 3 Value 4
0 2021-01-01 01:37:49.148748 0.014568 0.041711 0.009694 0.047044
1 2021-01-01 03:29:24.939551 0.032042 0.073345 0.014901 0.051690
2 2021-01-01 06:00:53.871182 0.040758 0.105496 0.046904 0.073747
3 2021-01-01 16:59:30.672400 0.061262 0.113711 0.083658 0.073939
4 2021-01-02 01:36:59.195226 0.090762 0.115689 0.087191 0.081972
.. ... ... ... ... ...
495 2021-04-18 05:26:41.805694 10.883107 11.917340 12.850949 13.834590
496 2021-04-18 11:52:30.124759 10.889271 11.946243 12.860569 13.870959
497 2021-04-18 13:27:59.735432 10.932131 11.977409 12.949012 13.929994
498 2021-04-18 18:58:02.280739 10.979734 11.988028 12.952918 13.991210
499 2021-04-18 19:17:01.745781 10.997603 11.995105 12.991302 13.995131
[500 rows x 5 columns]
It will come out looking like this:
Time Value 1 Value 2 Value 3 Value 4 Value 1_prev_mean Value 2_prev_mean Value 3_prev_mean Value 4_prev_mean Value 1_prev_std Value 2_prev_std Value 3_prev_std Value 4_prev_std
0 2021-01-01 01:37:49.148748 0.014568 0.041711 0.009694 0.047044 NaN NaN NaN NaN NaN NaN NaN NaN
1 2021-01-01 03:29:24.939551 0.032042 0.073345 0.014901 0.051690 NaN NaN NaN NaN NaN NaN NaN NaN
2 2021-01-01 06:00:53.871182 0.040758 0.105496 0.046904 0.073747 NaN NaN NaN NaN NaN NaN NaN NaN
3 2021-01-01 16:59:30.672400 0.061262 0.113711 0.083658 0.073939 NaN NaN NaN NaN NaN NaN NaN NaN
4 2021-01-02 01:36:59.195226 0.090762 0.115689 0.087191 0.081972 NaN NaN NaN NaN NaN NaN NaN NaN
.. ... ... ... ... ... ... ... ... ... ... ... ... ...
495 2021-04-18 05:26:41.805694 10.883107 11.917340 12.850949 13.834590 7.446702 8.458356 9.071314 9.948837 0.830943 0.961324 1.091647 1.102397
496 2021-04-18 11:52:30.124759 10.889271 11.946243 12.860569 13.870959 7.446702 8.458356 9.071314 9.948837 0.830943 0.961324 1.091647 1.102397
497 2021-04-18 13:27:59.735432 10.932131 11.977409 12.949012 13.929994 7.446702 8.458356 9.071314 9.948837 0.830943 0.961324 1.091647 1.102397
498 2021-04-18 18:58:02.280739 10.979734 11.988028 12.952918 13.991210 7.446702 8.458356 9.071314 9.948837 0.830943 0.961324 1.091647 1.102397
499 2021-04-18 19:17:01.745781 10.997603 11.995105 12.991302 13.995131 7.446702 8.458356 9.071314 9.948837 0.830943 0.961324 1.091647 1.102397
[500 rows x 13 columns]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.