[英]How to assign a unique identifier to a sequence of records
I have a dataframe df
,我有一个 dataframe df
,
ID MID TimeType Starttime Endtime next_Starttime
333429 7 NEB 2021-11-01 20:45:17 2021-11-01 20:45:44 2021-11-01 20:45:44
333430 7 AUF 2021-11-01 20:45:44 2021-11-01 21:00:00 2021-11-01 21:00:00
333476 7 AUF 2021-11-01 21:00:00 2021-11-01 21:03:36 2021-11-01 21:03:36
333477 7 NEB 2021-11-01 21:03:36 2021-11-01 21:11:43 2021-11-01 21:11:43
333502 7 AUF 2021-11-01 21:11:43 2021-11-01 21:11:44 2021-11-01 21:11:44
333511 7 AUF 2021-11-01 21:11:44 2021-11-01 21:25:01 2021-11-01 21:25:01
333553 7 AUF 2021-11-01 21:25:01 2021-11-01 21:40:01 2021-11-01 21:40:01
I would like to assign a unique id for the column TimeType
based on the sequence of repetition such that the desired output looks like this,我想根据重复顺序为TimeType
列分配一个唯一的 ID,以便所需的 output 看起来像这样,
ID MID TimeType Starttime Endtime next_Starttime unique_id
333429 7 NEB 2021-11-01 20:45:17 2021-11-01 20:45:44 2021-11-01 20:45:44 1
333430 7 AUF 2021-11-01 20:45:44 2021-11-01 21:00:00 2021-11-01 21:00:00 2
333476 7 AUF 2021-11-01 21:00:00 2021-11-01 21:03:36 2021-11-01 21:03:36 2
333477 7 NEB 2021-11-01 21:03:36 2021-11-01 21:11:43 2021-11-01 21:11:43 3
333502 7 AUF 2021-11-01 21:11:43 2021-11-01 21:11:44 2021-11-01 21:11:44 4
333511 7 AUF 2021-11-01 21:11:44 2021-11-01 21:25:01 2021-11-01 21:25:01 4
333553 7 AUF 2021-11-01 21:25:01 2021-11-01 21:40:01 2021-11-01 21:40:01 4
I tried using for-loops but based on the dataframe size the execution is too slow.我尝试使用 for 循环,但基于 dataframe 大小,执行速度太慢。
You can do你可以做
df['uni_id'] = df['TimeType'].ne(df['TimeType']).cumsum()
You can use shift
to compare the consecutive rows and cumsum
to make a count:您可以使用shift
比较连续的行和cumsum
进行计数:
df['unique_id'] = df['TimeType'].ne(df['TimeType'].shift()).cumsum()
output: output:
ID MID TimeType Starttime Endtime next_Starttime unique_id
0 333429 7 NEB 2021-11-01 20:45:17 2021-11-01 20:45:44 2021-11-01 20:45:44 1
1 333430 7 AUF 2021-11-01 20:45:44 2021-11-01 21:00:00 2021-11-01 21:00:00 2
2 333476 7 AUF 2021-11-01 21:00:00 2021-11-01 21:03:36 2021-11-01 21:03:36 2
3 333477 7 NEB 2021-11-01 21:03:36 2021-11-01 21:11:43 2021-11-01 21:11:43 3
4 333502 7 AUF 2021-11-01 21:11:43 2021-11-01 21:11:44 2021-11-01 21:11:44 4
5 333511 7 AUF 2021-11-01 21:11:44 2021-11-01 21:25:01 2021-11-01 21:25:01 4
6 333553 7 AUF 2021-11-01 21:25:01 2021-11-01 21:40:01 2021-11-01 21:40:01 4
import pdrle
pdrle.get_id(df.TimeType)
# 0 0
# 1 1
# 2 1
# 3 2
# 4 3
# 5 3
# 6 3
# dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.