简体   繁体   English

如何为记录序列分配唯一标识符

[英]How to assign a unique identifier to a sequence of records

I have a dataframe df ,我有一个 dataframe df

ID      MID  TimeType   Starttime             Endtime               next_Starttime                
333429  7    NEB        2021-11-01 20:45:17   2021-11-01 20:45:44   2021-11-01 20:45:44 
333430  7    AUF        2021-11-01 20:45:44   2021-11-01 21:00:00   2021-11-01 21:00:00 
333476  7    AUF        2021-11-01 21:00:00   2021-11-01 21:03:36   2021-11-01 21:03:36 
333477  7    NEB        2021-11-01 21:03:36   2021-11-01 21:11:43   2021-11-01 21:11:43 
333502  7    AUF        2021-11-01 21:11:43   2021-11-01 21:11:44   2021-11-01 21:11:44 
333511  7    AUF        2021-11-01 21:11:44   2021-11-01 21:25:01   2021-11-01 21:25:01 
333553  7    AUF        2021-11-01 21:25:01   2021-11-01 21:40:01   2021-11-01 21:40:01 

I would like to assign a unique id for the column TimeType based on the sequence of repetition such that the desired output looks like this,我想根据重复顺序为TimeType列分配一个唯一的 ID,以便所需的 output 看起来像这样,

ID      MID  TimeType   Starttime             Endtime               next_Starttime       unique_id          
333429  7    NEB        2021-11-01 20:45:17   2021-11-01 20:45:44   2021-11-01 20:45:44  1
333430  7    AUF        2021-11-01 20:45:44   2021-11-01 21:00:00   2021-11-01 21:00:00  2
333476  7    AUF        2021-11-01 21:00:00   2021-11-01 21:03:36   2021-11-01 21:03:36  2
333477  7    NEB        2021-11-01 21:03:36   2021-11-01 21:11:43   2021-11-01 21:11:43  3
333502  7    AUF        2021-11-01 21:11:43   2021-11-01 21:11:44   2021-11-01 21:11:44  4
333511  7    AUF        2021-11-01 21:11:44   2021-11-01 21:25:01   2021-11-01 21:25:01  4
333553  7    AUF        2021-11-01 21:25:01   2021-11-01 21:40:01   2021-11-01 21:40:01  4

I tried using for-loops but based on the dataframe size the execution is too slow.我尝试使用 for 循环,但基于 dataframe 大小,执行速度太慢。

You can do你可以做

df['uni_id'] = df['TimeType'].ne(df['TimeType']).cumsum()

You can use shift to compare the consecutive rows and cumsum to make a count:您可以使用shift比较连续的行和cumsum进行计数:

df['unique_id'] = df['TimeType'].ne(df['TimeType'].shift()).cumsum()

output: output:

       ID  MID TimeType            Starttime              Endtime       next_Starttime  unique_id
0  333429    7      NEB  2021-11-01 20:45:17  2021-11-01 20:45:44  2021-11-01 20:45:44          1
1  333430    7      AUF  2021-11-01 20:45:44  2021-11-01 21:00:00  2021-11-01 21:00:00          2
2  333476    7      AUF  2021-11-01 21:00:00  2021-11-01 21:03:36  2021-11-01 21:03:36          2
3  333477    7      NEB  2021-11-01 21:03:36  2021-11-01 21:11:43  2021-11-01 21:11:43          3
4  333502    7      AUF  2021-11-01 21:11:43  2021-11-01 21:11:44  2021-11-01 21:11:44          4
5  333511    7      AUF  2021-11-01 21:11:44  2021-11-01 21:25:01  2021-11-01 21:25:01          4
6  333553    7      AUF  2021-11-01 21:25:01  2021-11-01 21:40:01  2021-11-01 21:40:01          4
import pdrle

pdrle.get_id(df.TimeType)
# 0    0
# 1    1
# 2    1
# 3    2
# 4    3
# 5    3
# 6    3
# dtype: int64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何为DataFrame行分配唯一标识符 - How to assign unique identifier to DataFrame row 如何为 pandas dataframe 中的重复列值序列分配唯一 ID? - How to assign a unique id for a sequence of repeated column value in pandas dataframe? 基于具有预分配唯一标识符的 dataframe 为 dataframe 行分配唯一标识符 - Assign unique identifier for dataframe rows based on dataframe with preassigned unique identifier 如何在Python中为数据帧中的记录分配唯一值计数 - How to assign count of unique values to the records in a data frame in python Python - 为重复行分配唯一标识符 - Python - assign unique identifier for repeated rows 如何为 pandas boolean 掩码中的每个连续 True 值序列分配唯一的分组值 - How to assign unique grouping value for each sequence of consecutive True values in pandas boolean mask Python:如何为 XML 文档生成唯一标识符? - Python: how to generate a unique identifier for an XML document? 如何在没有唯一标识符的情况下抓取 span? - How can I scrape span with no unique identifier? 什么是唯一标识符,以及如何使用它进行选择? - What is a unique identifier and how to use it to select? 如何基于多列创建唯一标识符? - How to create a unique identifier based on multiple columns?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM