Have a dataframe like this -
df = {'Request': [0, 0, 1, 0, 1, 0, 0],
'Time': ['16:00', '17:00', '18:00', '19:00', '20:00', '20:30', '24:00'],
'grant': [3, 0, 0, 5, 0, 0, 5]}
pd.DataFrame(df).set_index('Time')
Out[16]:
Request grant
Time
16:00 0 3
17:00 0 0
18:00 1 0
19:00 0 5
20:00 1 0
20:30 0 0
24:00 0 5
Values in column 'Request' are boolean and denote whether a request was made or not. 1 = request 0 = no request. Values in column 'grant' denote the initial grant size.
I want to calculate the time between request and grant for each of the requests. So in this case they will be 19:00 - 18:00 = 1 hr and 24:00-20:00 = 4 Hrs. Is there a way to perform this operation on a large data set easily using pandas?
I would go about it something like this:
df = {'Request': [0, 0, 1, 0, 1, 0, 0],
'Time': ['16:00', '17:00', '18:00', '19:00', '20:00', '20:30', '24:00'],
'grant': [3, 0, 0, 5, 0, 0, 5]}
df = pd.DataFrame(df) #create DataFrame
#get rid of any rows have neither a grant nor request
df = df[(df[['grant', 'Request']].T != 0).any()]
#change the time in HH:MM to number of minutes
df['Time'] = df['Time'].str.split(":").apply(lambda x: int(x[0])*60 + int(x[1]))
#get the difference between those times
df['timeElapsed'] = df['Time'].diff()
#filter out the requests to only get the grants and their times.
#Also, drop the NA from the first line.
df = df[(df[['grant']].T != 0).any()].dropna()
#drop all columns except timeElapsed and Grant
df = df[['timeElapsed', 'grant']]
then the output looks like this with timeElaped in minutes:
timeElapsed grant
3 60.0 5
6 240.0 5
You need to convert to datetime your time column to get the difference, but you need change 24:00 to not get an error. Then you can use mask + pd.to_datetime .Filter the dataframe from the first request == 1
(df2) Then you can create groups based on the appearance of ones using groupby . calculate the difference through groupby.first and groupby.last
#transform Time column to get the diff
df['Time'].mask(df['Time'].eq('24:00'),'00:00',inplace=True)
df['Time']=pd.to_datetime(df['Time'])
#select rows from first request==1
mask=df.Request.eq(1).cumsum()>0
df2=df[mask]
#creating serie to groupby
groups=df2['Request'].eq(1).cumsum()
#get the difference by group
g=df2.groupby(groups)['Time']
diff=(g.last()-g.first()).dt.seconds/3600
print(diff)
Request
1 1.0
2 4.0
Name: Time, dtype: float64
If you want to create a new column you can use transform
:
#transform Time column to get the diff
df['Time'].mask(df['Time'].eq('24:00'),'00:00',inplace=True)
df['Time']=pd.to_datetime(df['Time'])
df['Time']=df['Time'].dt.hour
#select rows from first request==1
mask=df.Request.eq(1).cumsum()>0 #mask to first 1 in advance
df2=df[mask]
#creating serie to groupby
groups=df2['Request'].eq(1).cumsum() #serie to group
#Getting difference and save in a new column
g=df2.groupby(groups)['Time']
df.loc[mask,'difference']=g.transform(lambda x: x.iloc[len(x)-1]-x.iloc[0])
df['difference']=df['difference'].mask(df['difference']<0,df['difference']+24)
print(df)
Request Time grant difference
0 0 16 3 NaN
1 0 17 0 NaN
2 1 18 0 1.0
3 0 19 5 1.0
4 1 20 0 4.0
5 0 20 0 4.0
6 0 0 5 4.0
You first need to convert your Time
index into something subtractable to find the time delta. Using pd.to_timestamp
does not work because there's no 24:00
. The solution below uses decimal time (1:30PM = 13.5):
# Convert the index into decimal time
df.index = pd.to_timedelta(df.index + ':00') / pd.Timedelta(hours=1)
# Get time when each request was made
r = df[df['Request'] != 0].index.to_series()
# Get time where each grant was made
g = df[df['grant'] != 0].index.to_series()
# `asof` mean "get the last available value in `r` as the in `g.index`
tmp = r.asof(g)
df['Delta'] = tmp.index - tmp
Result:
Request grant Delta
Time
16.0 0 3 NaN
17.0 0 0 NaN
18.0 1 0 NaN
19.0 0 5 1.0
20.0 1 0 NaN
20.5 0 0 NaN
24.0 0 5 4.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.