In a large DataFrame, I have readings in one column and the local date and time of the readings in "DateTime" format in another column. I want to generate new columns in the same DateFrame that only contain the last readings of recorded days, weeks, or months in separate columns. It is also important to be able to choose the last day of the week . This is an example of data that I have (here, I chose Wednesday as the last day of the week):
local_date_time weekday readings
0 2022-04-29 17:03:25 Friday 468
1 2022-04-29 23:42:06 Friday 638
2 2022-04-30 00:22:06 Saturday 649
3 2022-04-30 16:42:07 Saturday 650
4 2022-04-30 19:42:06 Saturday 641
5 2022-04-30 23:59:06 Saturday 1301
6 2022-05-01 00:42:07 Sunday 1240
7 2022-05-01 04:12:07 Sunday 927
8 2022-05-01 09:52:07 Sunday 810
9 2022-05-01 16:42:07 Sunday 1024
10 2022-05-01 23:52:07 Sunday 551
11 2022-05-02 09:02:07 Monday 534
12 2022-05-02 13:42:07 Monday 684
13 2022-05-02 22:32:08 Monday 952
14 2022-05-02 23:59:07 Monday 628
15 2022-05-03 00:02:07 Tuesday 640
16 2022-05-03 06:12:08 Tuesday 762
17 2022-05-03 11:22:08 Tuesday 707
18 2022-05-03 14:12:08 Tuesday 623
19 2022-05-03 21:02:08 Tuesday 713
20 2022-05-03 23:42:08 Tuesday 606
21 2022-05-04 01:02:09 Wednesday 565
22 2022-05-04 05:32:09 Wednesday 495
23 2022-05-04 20:22:09 Wednesday 565
24 2022-05-04 23:59:09 Wednesday 693
25 2022-05-05 00:02:09 Thursday 723
26 2022-05-05 04:12:08 Thursday 534
27 2022-05-05 10:22:09 Thursday 464
28 2022-05-05 15:42:09 Thursday 479
29 2022-05-05 23:59:09 Thursday 478
For this purpose, I tried to use a conditional "df.loc" function to solve the problem. This is the code that I wrote:
df['Day'] = df['local_date_time'].dt.day
df['Hour'] = df['local_date_time'].dt.hour
df['Min'] = df['local_date_time'].dt.minute
df.loc[(df['Hour'] == 23) & (df['Min'] >= 59),'end_day'] = df['readings']
df.loc[(df['Hour'] == 23) & (df['Min'] >= 59) & (df['Weekday'] == 'Wednesday'),'end_week'] = df['readings']
df.loc[(df['Hour'] == 23) & (df['Min'] >= 59) & (df['Day'] == 30),'end_month'] = df['readings']
The code works fine until I have a reading at 23:59 of each day. However, if reading does not exist at 23:59 of each day the code does not work. ALSO , I could just choose one specific date (eg 30th in the example) as the last day of the month in this approach, which does not work for other months that have more or fewer days. This is the result that I would like to see.
local_date_time weekday readings end_day end_week end_month 2022-04-29 17:03:25 Friday 468 2022-04-29 23:42:06 Friday 638 638 2022-04-30 00:22:06 Saturday 649 2022-04-30 16:42:07 Saturday 650 2022-04-30 19:42:06 Saturday 641 2022-04-30 23:59:06 Saturday 1301 1301 1301 2022-05-01 00:42:07 Sunday 1240 2022-05-01 04:12:07 Sunday 927 2022-05-01 09:52:07 Sunday 810 2022-05-01 16:42:07 Sunday 1024 2022-05-01 23:52:07 Sunday 551 551 2022-05-02 09:02:07 Monday 534 2022-05-02 13:42:07 Monday 684 2022-05-02 22:32:08 Monday 952 2022-05-02 23:59:07 Monday 628 628 2022-05-03 00:02:07 Tuesday 640 2022-05-03 06:12:08 Tuesday 762 2022-05-03 11:22:08 Tuesday 707 2022-05-03 14:12:08 Tuesday 623 2022-05-03 21:02:08 Tuesday 713 2022-05-03 23:42:08 Tuesday 606 606 2022-05-04 01:02:09 Wednesday 565 2022-05-04 05:32:09 Wednesday 495 2022-05-04 20:22:09 Wednesday 565 2022-05-04 23:59:09 Wednesday 693 693 693 2022-05-05 00:02:09 Thursday 723 2022-05-05 04:12:08 Thursday 534 2022-05-05 10:22:09 Thursday 464 2022-05-05 15:42:09 Thursday 479 2022-05-05 23:59:09 Thursday 478 478
Resampling with a proper DatetimeIndex will be useful here.
# Make it a datetime index:
df.local_date_time = pd.to_datetime(df.local_date_time)
df = df.set_index('local_date_time')
# Do the resampling:
# End of Day:
df['end_day'] = df.resample('D')['readings'].transform(lambda x: x.tail(1))
# Weekly, Wednesdays:
df['end_week'] = df.resample('W-Wed')['readings'].transform(lambda x: x.tail(1))
# End of Month:
df['end_month'] = df.resample('M')['readings'].transform(lambda x: x.tail(1))
# Some corrections for the very end:
# Clear non-wednesdays:
df['end_week'] = df['end_week'].where(df.index.to_series().dt.weekday.eq(2), np.nan)
# clear non-end-of-months:
df['end_month'] = df['end_month'].where(df.index.to_series().dt.is_month_end, np.nan)
Output:
weekday readings end_day end_week end_month
local_date_time
2022-04-29 17:03:25 Friday 468 NaN NaN NaN
2022-04-29 23:42:06 Friday 638 638.0 NaN NaN
2022-04-30 00:22:06 Saturday 649 NaN NaN NaN
2022-04-30 16:42:07 Saturday 650 NaN NaN NaN
2022-04-30 19:42:06 Saturday 641 NaN NaN NaN
2022-04-30 23:59:06 Saturday 1301 1301.0 NaN 1301.0
2022-05-01 00:42:07 Sunday 1240 NaN NaN NaN
2022-05-01 04:12:07 Sunday 927 NaN NaN NaN
2022-05-01 09:52:07 Sunday 810 NaN NaN NaN
2022-05-01 16:42:07 Sunday 1024 NaN NaN NaN
2022-05-01 23:52:07 Sunday 551 551.0 NaN NaN
2022-05-02 09:02:07 Monday 534 NaN NaN NaN
2022-05-02 13:42:07 Monday 684 NaN NaN NaN
2022-05-02 22:32:08 Monday 952 NaN NaN NaN
2022-05-02 23:59:07 Monday 628 628.0 NaN NaN
2022-05-03 00:02:07 Tuesday 640 NaN NaN NaN
2022-05-03 06:12:08 Tuesday 762 NaN NaN NaN
2022-05-03 11:22:08 Tuesday 707 NaN NaN NaN
2022-05-03 14:12:08 Tuesday 623 NaN NaN NaN
2022-05-03 21:02:08 Tuesday 713 NaN NaN NaN
2022-05-03 23:42:08 Tuesday 606 606.0 NaN NaN
2022-05-04 01:02:09 Wednesday 565 NaN NaN NaN
2022-05-04 05:32:09 Wednesday 495 NaN NaN NaN
2022-05-04 20:22:09 Wednesday 565 NaN NaN NaN
2022-05-04 23:59:09 Wednesday 693 693.0 693.0 NaN
2022-05-05 00:02:09 Thursday 723 NaN NaN NaN
2022-05-05 04:12:08 Thursday 534 NaN NaN NaN
2022-05-05 10:22:09 Thursday 464 NaN NaN NaN
2022-05-05 15:42:09 Thursday 479 NaN NaN NaN
2022-05-05 23:59:09 Thursday 478 478.0 NaN NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.