在Pandas数据框中创建重复值索引

Question

Suppose I have a df 假设我有一个df

t status
1 ok
2 ok
3 ok
4 closed
5 closed
6 closed
7 bad input
8 bad input
9 closed
10 closed
11 ok
12 ok
13 closed
14 closed

I want to identify at what time "closed" appears and for how long. 我想确定何时“关闭”出现以及持续多长时间。

So the result should be 所以结果应该是

t status    index
1 ok          0
2 ok          0
3 ok          0
4 closed      1
5 closed      1
6 closed      1
7 bad input   0
8 bad input   0
9 closed      2
10 closed     2
11 ok         0
12 ok         0
13 closed     3
14 closed     3

I tried standard "for loop" approach but it is not feasible for large dataframe. 我尝试了标准的“ for循环”方法，但不适用于大型数据框。 I am thinking of a solution using numpy where and repeat 我正在考虑使用numpy where和重复的解决方案

np.where(tmp['status']=='Closed', 1, 0)

I am stuck on adding 1 everytime "Closed" reappears 每当“关闭”重新出现时，我都会加1

Answer 1

IIUC we using shift cumsum create the condition IIUC我们使用shift cumsum创造条件

df['New']=0
df.loc[df.status=='closed','New']=(df.status.eq('closed')&df.status.shift().ne('closed')).cumsum()
df
     t    status  New
0    1        ok    0
1    2        ok    0
2    3        ok    0
3    4    closed    1
4    5    closed    1
5    6    closed    1
6    7  badinput    0
7    8  badinput    0
8    9    closed    2
9   10    closed    2
10  11        ok    0
11  12        ok    0
12  13    closed    3
13  14    closed    3

Answer 2

trying something different: 尝试不同的东西：

import more_itertools as mit

s=df[df.status.eq('closed')].index.tolist() #get list of index which matches condition
d={v_:k+1 for k,v in enumerate(mit.consecutive_groups(s)) for v_ in v}
df.assign(New=df.index.map(d).fillna(0).astype(int)) #assign this back df=df.assign(..

     t     status  New
0    1         ok    0
1    2         ok    0
2    3         ok    0
3    4     closed    1
4    5     closed    1
5    6     closed    1
6    7  bad input    0
7    8  bad input    0
8    9     closed    2
9   10     closed    2
10  11         ok    0
11  12         ok    0
12  13     closed    3
13  14     closed    3

在Pandas数据框中创建重复值索引

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-07-28 15:32:05

解决方案2
1 2019-07-28 17:14:57

在Pandas数据框中创建重复值索引

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-07-28 15:32:05

解决方案2 1 2019-07-28 17:14:57

解决方案1
2 已采纳 2019-07-28 15:32:05

解决方案2
1 2019-07-28 17:14:57