I have a data-frame, say df
(last two columns I consider as datetime64[ns]
not str
),
data = [['abc', 'abc1', '1_1', '2021-06-01 06:00:00.035999', '2021-06-02 09:59:59.964000'],
['abc', 'abc1', '1_2', '2021-06-01 06:00:00.035999', '2021-06-02 09:59:59.964000'],
['abc', 'abc2', '1_1', '2021-06-01 06:00:00.035999', '2021-06-01 20:59:59.964001'],
['abc', 'abc2', '1_2', '2021-06-01 06:00:00.035999', '2021-06-01 20:59:59.964001'],
['abc', 'abc3', '1_1', '2021-06-01 06:00:00.035999', '2021-06-03 06:29:59.964000'],
['abc', 'abc3', '1_2', '2021-06-01 06:00:00.035999', '2021-06-03 06:29:59.964000'],
['abc', 'abc3', '2_1', '2021-06-04 06:30:00.000001', '2021-06-04 07:44:59.927999'],
['abc', 'abc3', '2_2', '2021-06-04 06:30:00.000001', '2021-06-04 07:44:59.927999']]
df = pd.DataFrame(data, columns = ['vehicle', 'order', 'work', 'Start', 'Finish'])
I want to find the time between two works. For example, I want to calculate the time between the finishing time of work 1_1 (vehicle: abc and order: abc1)
and starting time of work 1_2
. I am calculating it for each distinct order
.
vehicle order work Start Finish
0 abc abc1 1_1 2021-06-01 06:00:00.035999 2021-06-02 09:59:59.964000
1 abc abc1 1_2 2021-06-01 06:00:00.035999 2021-06-02 09:59:59.964000
2 abc abc2 1_1 2021-06-01 06:00:00.035999 2021-06-01 20:59:59.964001
3 abc abc2 1_2 2021-06-01 06:00:00.035999 2021-06-01 20:59:59.964001
4 abc abc3 1_1 2021-06-01 06:00:00.035999 2021-06-03 06:29:59.964000
5 abc abc3 1_2 2021-06-01 06:00:00.035999 2021-06-03 06:29:59.964000
6 abc abc3 2_1 2021-06-04 06:30:00.000001 2021-06-04 07:44:59.927999
7 abc abc3 2_2 2021-06-04 06:30:00.000001 2021-06-04 07:44:59.927999
I have written one code for this and it is working.
po_unique = df['order'].unique()
appended_data = []
for pos in po_unique:
x1 = df.copy()
x1 = x1.loc[x1['order'] == pos, :]
x1.reset_index(drop = True, inplace = True)
#print(x1)
aList = []
for i in range(len(x1) - 1):
t = (x1.Start[i + 1] - x1.Finish[i])/ dt.timedelta(hours=24)
aList.append(t)
aList.insert(0, 0)
x2 = x1.copy()
x2['flag'] = aList
appended_data.append(x2)
appended_data = pd.concat(appended_data)
I would like to receive some views about the code. Is there any alternative way to do this? The output for appended_data[['order', 'work', 'flag']]
looks like
Out[112]:
order work flag
0 abc1 1_1 0.000000
1 abc1 1_2 -1.166666
0 abc2 1_1 0.000000
1 abc2 1_2 -0.624999
0 abc3 1_1 0.000000
1 abc3 1_2 -2.020833
2 abc3 2_1 1.000000
3 abc3 2_2 -0.052082
Now I want to create another column flag1
such that if value of the flag column is greater than some threshold value then it will put 'F' in this column. I can do this also by using .apply()
function like
thresold = 0.9
appended_data['flag1'] = appended_data.apply(lambda row: 'F' if row['flag'] > thresold else ' ', axis = 1)
but if I want to put flag for two different thresholds, one is for "inside" like 1_1 to 1_2
and another one is for "outside" (when prefix changes) like 1_2 to 2_1
, then what to do. Say threshold_sameprefix = -1.0
threshold_diffprefix = 0.8
Expected output
vehicle order work flag flag1
abc abc1 1_1 0.000000
abc abc1 1_2 -1.166666
abc abc2 1_1 0.000000
abc abc2 1_2 -0.624999 F1
abc abc3 1_1 0.000000
abc abc3 1_2 -2.020833
abc abc3 2_1 1.000000 F2
abc abc3 2_2 -0.052082 F1
Please do not take minimum threshold and apply the logic what I did. I want to create a logic where I want to assign flag in an iterative way so that I can customize it.
1) Split the work
id into 2 parts: work_prefix
and work_suffix
:
df[['work_prefix', 'work_suffix']] = df['work'].str.split('_', expand=True)
2) Then, define a set of boolean masks corresponding to the conditions. These boolean masks are set considering group boundary of same order
using .groupby()
:
threshold_sameprefix = -1.0 # given threshold value
threshold_diffprefix = 0.8 # given threshold value
w_ne = df['work'] != df.groupby('order')['work'].shift() # work id changed
wp_eq = df['work_prefix'] == df.groupby('order')['work_prefix'].shift() # same work prefix
wp_ne = df['work_prefix'] != df.groupby('order')['work_prefix'].shift() # different work prefix
m1 = w_ne & wp_eq & (df['flag'] > threshold_sameprefix) # condition for 'F1'
m2 = w_ne & wp_ne & (df['flag'] > threshold_diffprefix) # condition for 'F2'
3) Finally, use .loc
with the boolean masks to set up flag1
with values F1
and F2
, as follows:
df['flag1'] = ' ' # init flag1 to blank
df.loc[m1, 'flag1'] = 'F1'
df.loc[m2, 'flag1'] = 'F2'
Input
vehicle order work flag
0 abc abc1 1_1 0.000000
1 abc abc1 1_2 -1.166666
2 abc abc2 1_1 0.000000
3 abc abc2 1_2 -0.624999
4 abc abc3 1_1 0.000000
5 abc abc3 1_2 -2.020833
6 abc abc3 2_1 1.000000
7 abc abc3 2_2 -0.052082
Output:
vehicle order work flag work_prefix work_suffix flag1
0 abc abc1 1_1 0.000000 1 1
1 abc abc1 1_2 -1.166666 1 2
2 abc abc2 1_1 0.000000 1 1
3 abc abc2 1_2 -0.624999 1 2 F1
4 abc abc3 1_1 0.000000 1 1
5 abc abc3 1_2 -2.020833 1 2
6 abc abc3 2_1 1.000000 2 1 F2
7 abc abc3 2_2 -0.052082 2 2 F1
Optionally, you can remove the 2 working columns work_prefix
and work_suffix
by:
df = df.drop(['work_prefix', 'work_suffix'], axis=1)
To set up your first column flag
more efficiently instead of using looping, you can use:
data = [['abc', 'abc1', '1_1', '2021-06-01 06:00:00.035999', '2021-06-02 09:59:59.964000'],
['abc', 'abc1', '1_2', '2021-06-01 06:00:00.035999', '2021-06-02 09:59:59.964000'],
['abc', 'abc2', '1_1', '2021-06-01 06:00:00.035999', '2021-06-01 20:59:59.964001'],
['abc', 'abc2', '1_2', '2021-06-01 06:00:00.035999', '2021-06-01 20:59:59.964001'],
['abc', 'abc3', '1_1', '2021-06-01 06:00:00.035999', '2021-06-03 06:29:59.964000'],
['abc', 'abc3', '1_2', '2021-06-01 06:00:00.035999', '2021-06-03 06:29:59.964000'],
['abc', 'abc3', '2_1', '2021-06-04 06:30:00.000001', '2021-06-04 07:44:59.927999'],
['abc', 'abc3', '2_2', '2021-06-04 06:30:00.000001', '2021-06-04 07:44:59.927999']]
df = pd.DataFrame(data, columns = ['vehicle', 'order', 'work', 'Start', 'Finish'])
df['Start'] = pd.to_datetime(df['Start'])
df['Finish'] = pd.to_datetime(df['Finish'])
Main codes to replace your codes with looping:
df['flag'] = ((df['Start'] - df.groupby('order')['Finish'].shift()) / pd.Timedelta(days=1)).fillna(0)
Result:
print(df)
vehicle order work Start Finish flag
0 abc abc1 1_1 2021-06-01 06:00:00.035999 2021-06-02 09:59:59.964000 0.000000
1 abc abc1 1_2 2021-06-01 06:00:00.035999 2021-06-02 09:59:59.964000 -1.166666
2 abc abc2 1_1 2021-06-01 06:00:00.035999 2021-06-01 20:59:59.964001 0.000000
3 abc abc2 1_2 2021-06-01 06:00:00.035999 2021-06-01 20:59:59.964001 -0.624999
4 abc abc3 1_1 2021-06-01 06:00:00.035999 2021-06-03 06:29:59.964000 0.000000
5 abc abc3 1_2 2021-06-01 06:00:00.035999 2021-06-03 06:29:59.964000 -2.020833
6 abc abc3 2_1 2021-06-04 06:30:00.000001 2021-06-04 07:44:59.927999 1.000000
7 abc abc3 2_2 2021-06-04 06:30:00.000001 2021-06-04 07:44:59.927999 -0.052082
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.