I have a data frame like so which records the type of an IP at a specific time.
IP Time Type
101 2018-10-16 01:07:11 A
101 2018-10-16 01:08:34 A
101 2018-10-16 02:54:11 B
101 2018-10-16 14:07:39 A
102 2018-10-17 01:09:10 A
102 2018-10-17 01:38:24 A
102 2018-10-17 02:44:10 A
102 2018-10-17 14:17:40 C
How can I create a new column TimeCount
which keeps track of the number of times an IP appears before it switches to a new type?
expected output is shown below:
IP Time Type TimeCount
101 2018-10-16 01:07:11 A 2
101 2018-10-16 01:08:34 A 2
101 2018-10-16 02:54:11 B 1
101 2018-10-16 14:07:39 A 1
102 2018-10-17 01:09:10 A 3
102 2018-10-17 01:38:24 A 3
102 2018-10-17 02:44:10 A 3
102 2018-10-17 14:17:40 C 1
I'm thinking I should be using shift()
but not sure how to apply it in pandas. If it never switches just keep the count of how many times it appears as the last type.
Let us try groupby
with cumsum
create the key, then do the transform
s = df.groupby('IP')['Type'].apply(lambda x : x.ne(x.shift()).cumsum())
df['new'] = df['Type'].groupby([df['IP'],s]).transform('count')
df
IP Time Type new
101 2018-10-16 01:07:11 A 2
101 2018-10-16 01:08:34 A 2
101 2018-10-16 02:54:11 B 1
101 2018-10-16 14:07:39 A 1
102 2018-10-17 01:09:10 A 3
102 2018-10-17 01:38:24 A 3
102 2018-10-17 02:44:10 A 3
102 2018-10-17 14:17:40 C 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.