简体   繁体   中英

pandas unique id for sequences

I want to generate a unique id for each sequence in a pandas dataframe, where the start of sequence is labeled from another column.

I have the X, Y, and BOOL columns and want the generate the NEW_ID column

X  Y  BOOL  NEW_ID

x  y  TRUE    1
x  y  FALSE   1
x  y  FALSE   1
x  y  TRUE    2
x  y  FALSE   2
x  y  FALSE   2
x  y  FALSE   2
x  y  TRUE    3
x  y  TRUE    4
x  y  FALSE   4

I am trying to find a solution without any for loops as I have a large dataframe and it takes too long..

Using cumsum with BOOL column

df['New_ID']=df.BOOL.cumsum()
df
Out[39]: 
   X  Y   BOOL  NEW_ID  New_ID
0  x  y   True       1       1
1  x  y  False       1       1
2  x  y  False       1       1
3  x  y   True       2       2
4  x  y  False       2       2
5  x  y  False       2       2
6  x  y  False       2       2
7  x  y   True       3       3
8  x  y   True       4       4
9  x  y  False       4       4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM