[英]Create repeating values index in Pandas dataframe
Suppose I have a df 假设我有一个df
t status
1 ok
2 ok
3 ok
4 closed
5 closed
6 closed
7 bad input
8 bad input
9 closed
10 closed
11 ok
12 ok
13 closed
14 closed
I want to identify at what time "closed" appears and for how long. 我想确定何时“关闭”出现以及持续多长时间。
So the result should be 所以结果应该是
t status index
1 ok 0
2 ok 0
3 ok 0
4 closed 1
5 closed 1
6 closed 1
7 bad input 0
8 bad input 0
9 closed 2
10 closed 2
11 ok 0
12 ok 0
13 closed 3
14 closed 3
I tried standard "for loop" approach but it is not feasible for large dataframe. 我尝试了标准的“ for循环”方法,但不适用于大型数据框。 I am thinking of a solution using numpy where and repeat 我正在考虑使用numpy where和重复的解决方案
np.where(tmp['status']=='Closed', 1, 0)
I am stuck on adding 1 everytime "Closed" reappears 每当“关闭”重新出现时,我都会加1
IIUC we using shift
cumsum
create the condition IIUC我们使用shift
cumsum
创造条件
df['New']=0
df.loc[df.status=='closed','New']=(df.status.eq('closed')&df.status.shift().ne('closed')).cumsum()
df
t status New
0 1 ok 0
1 2 ok 0
2 3 ok 0
3 4 closed 1
4 5 closed 1
5 6 closed 1
6 7 badinput 0
7 8 badinput 0
8 9 closed 2
9 10 closed 2
10 11 ok 0
11 12 ok 0
12 13 closed 3
13 14 closed 3
trying something different: 尝试不同的东西:
import more_itertools as mit
s=df[df.status.eq('closed')].index.tolist() #get list of index which matches condition
d={v_:k+1 for k,v in enumerate(mit.consecutive_groups(s)) for v_ in v}
df.assign(New=df.index.map(d).fillna(0).astype(int)) #assign this back df=df.assign(..
t status New
0 1 ok 0
1 2 ok 0
2 3 ok 0
3 4 closed 1
4 5 closed 1
5 6 closed 1
6 7 bad input 0
7 8 bad input 0
8 9 closed 2
9 10 closed 2
10 11 ok 0
11 12 ok 0
12 13 closed 3
13 14 closed 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.