简体   繁体   中英

Counting number of times instance appears before switch in PANDAS Python

I have a data frame like so which records the type of an IP at a specific time.

IP   Time                 Type  
101  2018-10-16 01:07:11  A     
101  2018-10-16 01:08:34  A     
101  2018-10-16 02:54:11  B     
101  2018-10-16 14:07:39  A     
102  2018-10-17 01:09:10  A     
102  2018-10-17 01:38:24  A     
102  2018-10-17 02:44:10  A     
102  2018-10-17 14:17:40  C     

How can I create a new column TimeCount which keeps track of the number of times an IP appears before it switches to a new type?

expected output is shown below:

IP   Time                 Type  TimeCount
101  2018-10-16 01:07:11  A     2
101  2018-10-16 01:08:34  A     2
101  2018-10-16 02:54:11  B     1
101  2018-10-16 14:07:39  A     1
102  2018-10-17 01:09:10  A     3
102  2018-10-17 01:38:24  A     3
102  2018-10-17 02:44:10  A     3
102  2018-10-17 14:17:40  C     1

I'm thinking I should be using shift() but not sure how to apply it in pandas. If it never switches just keep the count of how many times it appears as the last type.

Let us try groupby with cumsum create the key, then do the transform

s = df.groupby('IP')['Type'].apply(lambda x : x.ne(x.shift()).cumsum())
df['new'] = df['Type'].groupby([df['IP'],s]).transform('count')
df
             IP      Time Type  new
101  2018-10-16  01:07:11    A    2
101  2018-10-16  01:08:34    A    2
101  2018-10-16  02:54:11    B    1
101  2018-10-16  14:07:39    A    1
102  2018-10-17  01:09:10    A    3
102  2018-10-17  01:38:24    A    3
102  2018-10-17  02:44:10    A    3
102  2018-10-17  14:17:40    C    1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM