I have this dataFrame where some tasks happened time period
Date Start Time End Time
0 2016-01-01 0:00:00 2016-01-01 0:10:00 2016-01-01 0:25:00
1 2016-01-01 0:00:00 2016-01-01 1:17:00 2016-01-01 1:31:00
2 2016-01-02 0:00:00 2016-01-02 0:30:00 2016-01-02 0:32:00
... ... ... ...
Convert this df to 30 mins interval Expected outcome
Date Hours
1 2016-01-01 0:30:00 0:15
2 2016-01-01 1:00:00 0:00
3 2016-01-01 1:30:00 0:13
4 2016-01-01 2:00:00 0:01
5 2016-01-01 2:30:00 0:00
6 2016-01-01 3:00:00 0:00
... ...
47 2016-01-01 23:30:00 0:00
48 2016-01-02 23:59:59 0:00
49 2016-01-02 00:30:00 0:00
50 2016-01-02 01:00:00 0:02
... ...
I was trying to do with for loop which was getting tedious. Any simple way to do in pandas.
IIUC you can discard the Date
column, get the time difference between start and end, groupby
30 minutes and agg
on first
(assuming you always have one entry only per 30 minutes slot):
print (df.assign(Diff=df["End Time"]-df["Start Time"])
.groupby(pd.Grouper(key="Start Time", freq="30T"))
.agg({"Diff": "first"})
.fillna(pd.Timedelta(seconds=0)))
Diff
Start Time
2016-01-01 00:00:00 0 days 00:15:00
2016-01-01 00:30:00 0 days 00:00:00
2016-01-01 01:00:00 0 days 00:14:00
2016-01-01 01:30:00 0 days 00:00:00
2016-01-01 02:00:00 0 days 00:00:00
2016-01-01 02:30:00 0 days 00:00:00
...
2016-01-02 00:30:00 0 days 00:02:00
The idea is to create a series with 0 and DatetimeIndex
per minutes between min
start time and max
end time. Then add 1 where Start Time and subtract 1 where End Time. You can then use cumsum
to count the values between Start and End, resample.sum
per 30 minutes and reset_index
. The last line of code is to get the proper format in the Hours column.
#create a series of 0 with a datetime index
res = pd.Series(data=0,
index= pd.DatetimeIndex(pd.date_range(df['Start Time'].min(),
df['End Time'].max(),
freq='T'),
name='Dates'),
name='Hours')
# add 1 o the start time and remove 1 to the end start
res[df['Start Time']] += 1
res[df['End Time']] -= 1
# cumsum to get the right value for each minute then resample per 30 minutes
res = (res.cumsum()
.resample('30T', label='right').sum()
.reset_index('Dates')
)
# change the format of the Hours column, honestly not necessary
res['Hours'] = pd.to_datetime(res['Hours'], format='%M').dt.strftime('%H:%M') # or .dt.time
print(res)
Dates Hours
0 2016-01-01 00:30:00 00:15
1 2016-01-01 01:00:00 00:00
2 2016-01-01 01:30:00 00:13
3 2016-01-01 02:00:00 00:01
4 2016-01-01 02:30:00 00:00
5 2016-01-01 03:00:00 00:00
...
48 2016-01-02 00:30:00 00:00
49 2016-01-02 01:00:00 00:02
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.