I want to generate a unique id for each sequence in a pandas dataframe, where the start of sequence is labeled from another column.
I have the X, Y, and BOOL columns and want the generate the NEW_ID column
X Y BOOL NEW_ID
x y TRUE 1
x y FALSE 1
x y FALSE 1
x y TRUE 2
x y FALSE 2
x y FALSE 2
x y FALSE 2
x y TRUE 3
x y TRUE 4
x y FALSE 4
I am trying to find a solution without any for loops as I have a large dataframe and it takes too long..
Using cumsum
with BOOL column
df['New_ID']=df.BOOL.cumsum()
df
Out[39]:
X Y BOOL NEW_ID New_ID
0 x y True 1 1
1 x y False 1 1
2 x y False 1 1
3 x y True 2 2
4 x y False 2 2
5 x y False 2 2
6 x y False 2 2
7 x y True 3 3
8 x y True 4 4
9 x y False 4 4
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.