简体   繁体   中英

How to find and extract the last reading of recorded days/weeks/months from one column to new ones in a DataFrame using pandas?

In a large DataFrame, I have readings in one column and the local date and time of the readings in "DateTime" format in another column. I want to generate new columns in the same DateFrame that only contain the last readings of recorded days, weeks, or months in separate columns. It is also important to be able to choose the last day of the week . This is an example of data that I have (here, I chose Wednesday as the last day of the week):

       local_date_time    weekday  readings
0  2022-04-29 17:03:25     Friday       468
1  2022-04-29 23:42:06     Friday       638
2  2022-04-30 00:22:06   Saturday       649
3  2022-04-30 16:42:07   Saturday       650
4  2022-04-30 19:42:06   Saturday       641
5  2022-04-30 23:59:06   Saturday      1301
6  2022-05-01 00:42:07     Sunday      1240
7  2022-05-01 04:12:07     Sunday       927
8  2022-05-01 09:52:07     Sunday       810
9  2022-05-01 16:42:07     Sunday      1024
10 2022-05-01 23:52:07     Sunday       551
11 2022-05-02 09:02:07     Monday       534
12 2022-05-02 13:42:07     Monday       684
13 2022-05-02 22:32:08     Monday       952
14 2022-05-02 23:59:07     Monday       628
15 2022-05-03 00:02:07    Tuesday       640
16 2022-05-03 06:12:08    Tuesday       762
17 2022-05-03 11:22:08    Tuesday       707
18 2022-05-03 14:12:08    Tuesday       623
19 2022-05-03 21:02:08    Tuesday       713
20 2022-05-03 23:42:08    Tuesday       606
21 2022-05-04 01:02:09  Wednesday       565
22 2022-05-04 05:32:09  Wednesday       495
23 2022-05-04 20:22:09  Wednesday       565
24 2022-05-04 23:59:09  Wednesday       693
25 2022-05-05 00:02:09   Thursday       723
26 2022-05-05 04:12:08   Thursday       534
27 2022-05-05 10:22:09   Thursday       464
28 2022-05-05 15:42:09   Thursday       479
29 2022-05-05 23:59:09   Thursday       478

For this purpose, I tried to use a conditional "df.loc" function to solve the problem. This is the code that I wrote:

df['Day'] = df['local_date_time'].dt.day    
df['Hour'] = df['local_date_time'].dt.hour
df['Min'] = df['local_date_time'].dt.minute
df.loc[(df['Hour'] == 23) & (df['Min'] >= 59),'end_day'] = df['readings']
df.loc[(df['Hour'] == 23) & (df['Min'] >= 59) & (df['Weekday'] == 'Wednesday'),'end_week'] = df['readings']
df.loc[(df['Hour'] == 23) & (df['Min'] >= 59) & (df['Day'] == 30),'end_month'] = df['readings']

The code works fine until I have a reading at 23:59 of each day. However, if reading does not exist at 23:59 of each day the code does not work. ALSO , I could just choose one specific date (eg 30th in the example) as the last day of the month in this approach, which does not work for other months that have more or fewer days. This is the result that I would like to see.

 local_date_time weekday readings end_day end_week end_month 2022-04-29 17:03:25 Friday 468 2022-04-29 23:42:06 Friday 638 638 2022-04-30 00:22:06 Saturday 649 2022-04-30 16:42:07 Saturday 650 2022-04-30 19:42:06 Saturday 641 2022-04-30 23:59:06 Saturday 1301 1301 1301 2022-05-01 00:42:07 Sunday 1240 2022-05-01 04:12:07 Sunday 927 2022-05-01 09:52:07 Sunday 810 2022-05-01 16:42:07 Sunday 1024 2022-05-01 23:52:07 Sunday 551 551 2022-05-02 09:02:07 Monday 534 2022-05-02 13:42:07 Monday 684 2022-05-02 22:32:08 Monday 952 2022-05-02 23:59:07 Monday 628 628 2022-05-03 00:02:07 Tuesday 640 2022-05-03 06:12:08 Tuesday 762 2022-05-03 11:22:08 Tuesday 707 2022-05-03 14:12:08 Tuesday 623 2022-05-03 21:02:08 Tuesday 713 2022-05-03 23:42:08 Tuesday 606 606 2022-05-04 01:02:09 Wednesday 565 2022-05-04 05:32:09 Wednesday 495 2022-05-04 20:22:09 Wednesday 565 2022-05-04 23:59:09 Wednesday 693 693 693 2022-05-05 00:02:09 Thursday 723 2022-05-05 04:12:08 Thursday 534 2022-05-05 10:22:09 Thursday 464 2022-05-05 15:42:09 Thursday 479 2022-05-05 23:59:09 Thursday 478 478

Resampling with a proper DatetimeIndex will be useful here.

# Make it a datetime index:
df.local_date_time = pd.to_datetime(df.local_date_time)
df = df.set_index('local_date_time')

# Do the resampling:
    # End of Day:
df['end_day'] = df.resample('D')['readings'].transform(lambda x: x.tail(1))
    # Weekly, Wednesdays:
df['end_week'] = df.resample('W-Wed')['readings'].transform(lambda x: x.tail(1))
    # End of Month:
df['end_month'] = df.resample('M')['readings'].transform(lambda x: x.tail(1))

# Some corrections for the very end:
    # Clear non-wednesdays:
df['end_week'] = df['end_week'].where(df.index.to_series().dt.weekday.eq(2), np.nan)
    # clear non-end-of-months:
df['end_month'] = df['end_month'].where(df.index.to_series().dt.is_month_end, np.nan)

Output:

                       weekday  readings  end_day  end_week  end_month
local_date_time
2022-04-29 17:03:25     Friday       468      NaN       NaN        NaN
2022-04-29 23:42:06     Friday       638    638.0       NaN        NaN
2022-04-30 00:22:06   Saturday       649      NaN       NaN        NaN
2022-04-30 16:42:07   Saturday       650      NaN       NaN        NaN
2022-04-30 19:42:06   Saturday       641      NaN       NaN        NaN
2022-04-30 23:59:06   Saturday      1301   1301.0       NaN     1301.0
2022-05-01 00:42:07     Sunday      1240      NaN       NaN        NaN
2022-05-01 04:12:07     Sunday       927      NaN       NaN        NaN
2022-05-01 09:52:07     Sunday       810      NaN       NaN        NaN
2022-05-01 16:42:07     Sunday      1024      NaN       NaN        NaN
2022-05-01 23:52:07     Sunday       551    551.0       NaN        NaN
2022-05-02 09:02:07     Monday       534      NaN       NaN        NaN
2022-05-02 13:42:07     Monday       684      NaN       NaN        NaN
2022-05-02 22:32:08     Monday       952      NaN       NaN        NaN
2022-05-02 23:59:07     Monday       628    628.0       NaN        NaN
2022-05-03 00:02:07    Tuesday       640      NaN       NaN        NaN
2022-05-03 06:12:08    Tuesday       762      NaN       NaN        NaN
2022-05-03 11:22:08    Tuesday       707      NaN       NaN        NaN
2022-05-03 14:12:08    Tuesday       623      NaN       NaN        NaN
2022-05-03 21:02:08    Tuesday       713      NaN       NaN        NaN
2022-05-03 23:42:08    Tuesday       606    606.0       NaN        NaN
2022-05-04 01:02:09  Wednesday       565      NaN       NaN        NaN
2022-05-04 05:32:09  Wednesday       495      NaN       NaN        NaN
2022-05-04 20:22:09  Wednesday       565      NaN       NaN        NaN
2022-05-04 23:59:09  Wednesday       693    693.0     693.0        NaN
2022-05-05 00:02:09   Thursday       723      NaN       NaN        NaN
2022-05-05 04:12:08   Thursday       534      NaN       NaN        NaN
2022-05-05 10:22:09   Thursday       464      NaN       NaN        NaN
2022-05-05 15:42:09   Thursday       479      NaN       NaN        NaN
2022-05-05 23:59:09   Thursday       478    478.0       NaN        NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM