[英]Pandas add label column based on conditions
Setup:
df_input = pd.DataFrame(data={'Session': [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2],
'Action': ['Start', 'Action1', 'Action2', 'Action3', 'Pause', 'Action4',
'Action5', 'Resume', 'Finish', 'Start', 'Action1', 'Action2', 'Pause',
'Action3', 'Action4', 'Resume', 'Finish', 'Start', 'Action1',
'Action2', 'Finish']})
df_output = pd.DataFrame(data={'Session': [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2],
'Action': ['Start', 'Action1', 'Action2', 'Action3', 'Pause', 'Action4', 'Action5',
'Resume', 'Finish', 'Start', 'Action1', 'Action2', 'Pause', 'Action3',
'Action4', 'Resume', 'Finish', 'Start', 'Action1', 'Action2', 'Finish'],
'Label': [0, 1, 2, 3, 4, 4, 4, 4, 5, 0, 1, 2, 3, 3, 3, 3, 4, 0, 1, 2, 3]})
Input:
Session Action
0 0 Start
1 0 Action1
2 0 Action2
3 0 Action3
4 0 Pause
5 0 Action4
6 0 Action5
7 0 Resume
8 0 Finish
9 1 Start
10 1 Action1
11 1 Action2
12 1 Pause
13 1 Action3
14 1 Action4
15 1 Resume
16 1 Finish
17 2 Start
18 2 Action1
19 2 Action2
20 2 Finish
Output:
Session Action Label
0 0 Start 0
1 0 Action1 1
2 0 Action2 2
3 0 Action3 3
4 0 Pause 4 #Same
5 0 Action4 4 #Same
6 0 Action5 4 #Same
7 0 Resume 4 #Same
8 0 Finish 5
9 1 Start 0
10 1 Action1 1
11 1 Action2 2
12 1 Pause 3 #Same
13 1 Action3 3 #Same
14 1 Action4 3 #Same
15 1 Resume 3 #Same
16 1 Finish 4
17 2 Start 0
18 2 Action1 1
19 2 Action2 2
20 2 Finish 3
Labeling Rule:
In every session, label remains same from pause until resumed. Labeling Rule:
在每个会话中,从暂停到恢复,标签保持不变。
I had been trying for so long but was unable to make labels same from pause till resumed
我已经尝试了很长时间,但无法使标签
from pause till resumed
相同
This only requires some logic to be written, refer -这只需要写一些逻辑,参考 -
label_list, label, paused = [], 0, 0
for index, row in df_input.iterrows():
#print(index, row['Action'])
if paused and not row['Action'] == 'Resume':
label_list.append(label)
elif row['Action'] == 'Resume':
label_list.append(label)
label+=1
paused=0
else:
if row['Action'] == 'Pause':
#label+=1
label_list.append(label)
paused = 1
elif row['Action'] == 'Finish':
#label+=1
label_list.append(label)
label=0
else:
label_list.append(label)
label+=1
Output-输出-
Session Action Label
0 0 Start 0
1 0 Action1 1
2 0 Action2 2
3 0 Action3 3
4 0 Pause 4
5 0 Action4 4
6 0 Action5 4
7 0 Resume 4
8 0 Finish 5
9 1 Start 0
10 1 Action1 1
11 1 Action2 2
12 1 Pause 3
13 1 Action3 3
14 1 Action4 3
15 1 Resume 3
16 1 Finish 4
17 2 Start 0
18 2 Action1 1
19 2 Action2 2
20 2 Finish 3
We can try creating a mask to determine exactly where
values are paused, then we can mask
out the Paused rows before taking the total within group:我们可以尝试创建一个面具,以确定到底
where
值被暂停,那么我们就可以mask
采取组内的总前出暂停行:
m = (
df_input['Action']
.where(df_input['Action'].isin(['Pause', 'Resume']))
.groupby(df_input['Session']).shift()
.ffill().eq('Pause')
)
df_input['Label'] = (
df_input['Action'].mask(m).notna()
.groupby(df_input['Session']).cumsum() - 1
)
Session Action Label
0 0 Start 0
1 0 Action1 1
2 0 Action2 2
3 0 Action3 3
4 0 Pause 4
5 0 Action4 4
6 0 Action5 4
7 0 Resume 4
8 0 Finish 5
9 1 Start 0
10 1 Action1 1
11 1 Action2 2
12 1 Pause 3
13 1 Action3 3
14 1 Action4 3
15 1 Resume 3
16 1 Finish 4
17 2 Start 0
18 2 Action1 1
19 2 Action2 2
20 2 Finish 3
Breakdown of steps in a DataFrame for reference: DataFrame 中的步骤分解以供参考:
pd.DataFrame({
1: df_input['Action']
.where(df_input['Action'].isin(['Pause', 'Resume'])),
2: df_input['Action']
.where(df_input['Action'].isin(['Pause', 'Resume']))
.groupby(df_input['Session']).shift(),
3: df_input['Action']
.where(df_input['Action'].isin(['Pause', 'Resume']))
.groupby(df_input['Session']).shift()
.ffill(),
4: df_input['Action']
.where(df_input['Action'].isin(['Pause', 'Resume']))
.groupby(df_input['Session']).shift()
.ffill().eq('Pause'),
5: df_input['Action'].mask(m),
6: df_input['Action'].mask(m).notna(),
7: df_input['Action'].mask(m).notna()
.groupby(df_input['Session']).cumsum() - 1
}).rename_axis(columns='Steps')
Steps 1 2 3 4 5 6 7
0 NaN NaN NaN False Start True 0
1 NaN NaN NaN False Action1 True 1
2 NaN NaN NaN False Action2 True 2
3 NaN NaN NaN False Action3 True 3
4 Pause NaN NaN False Pause True 4
5 NaN Pause Pause True NaN False 4
6 NaN NaN Pause True NaN False 4
7 Resume NaN Pause True NaN False 4
8 NaN Resume Resume False Finish True 5
9 NaN NaN Resume False Start True 0
10 NaN NaN Resume False Action1 True 1
11 NaN NaN Resume False Action2 True 2
12 Pause NaN Resume False Pause True 3
13 NaN Pause Pause True NaN False 3
14 NaN NaN Pause True NaN False 3
15 Resume NaN Pause True NaN False 3
16 NaN Resume Resume False Finish True 4
17 NaN NaN Resume False Start True 0
18 NaN NaN Resume False Action1 True 1
19 NaN NaN Resume False Action2 True 2
20 NaN NaN Resume False Finish True 3
Docs for Methods Used:所用方法的文档:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.