[英]Find out middle occurrence of “0” and first occurrence ''1" of an event in pandas dataframe
Hi I have a pandas dataframe which has event columns and other columns as well. 嗨,我有一个pandas数据框,其中包含事件列和其他列。 I want to perform a group by on id and on that group by i want to take 2 records out of all continues 0s i want to find out a pattern of continues 5 0's could be more but it has to always followed by 1 as well and then identify set of records ie continues 5 0's and followed by next 1 then get middle row of (0s out of those 5 set of 0's) record and find out the first 1 after those 0's and take that row.
我希望通过id和on group执行一个小组,我希望从所有继续0中取出2个记录我想找出一个继续5 0的模式可能更多但是它必须始终跟随1以及然后识别一组记录,即继续5 0,然后是接下来的1,然后得到中间行(这5个0的0中的0)记录,找出那些0之后的第1个并取出那一行。 But for 0s alteast i should get repeated for 5 times or more then take mid row out of those last 5.
但对于0s替换我应该重复5次或更多,然后从最后5次中间排。
In short: I want the set of 0's and 1's and condition is take the 1's only for which above you find continues 5 0's or more, if this pattern is multiple time then take one pattern get two records for every id having 0's and 1's 简而言之:我想要0和1的集合,条件只取1的上面你找到的继续5 0或更多,如果这个模式是多次,那么采取一个模式得到两个记录每个id为0和1的
for eg. 例如。
import pandas as pd
data={'id':[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2],
'name': ['a','b','c','d','e','f','g','h','i','j','k','l','m','n'
,'o','p','q','r','s','t','a1','b1','c1','d1','e1','f1','g1','h1','i1','j1','k1','l1','m1','n1'
,'o1','p1','q1','r1','s1','t1','aa','bb','cc','dd','ee','ff',
'gg','hh','ii','jj','kk','ll','mm','nn'
,'oo','pp','qq','rr','ss','tt','aa1','bb1','cc1','dd1','ee1','ff1',
'gg1','hh1','ii1','jj1','kk1','ll1','mm1','nn1'
,'oo1','pp1','qq1','rr1','ss1','tt1'],
'value':[0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,1,0,1,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0]}
df=pd.DataFrame.from_dict(data)
As a output i want to get 2 records per id one for 0 and one for 1's. 作为输出,我希望每个id获得2个记录,一个用于0,一个用于1。 And 0 row should be middle records of 5 or more consecutive 0s.
并且0行应该是5个或更多个连续0的中间记录。
The expected output is: 预期的产出是:
id name value
16 1 q 0
19 1 t 1
64 2 ee1 0
67 2 hh1 1
You can do it using pivot table and applying masks for the different values. 您可以使用数据透视表并为不同的值应用掩码。 First we should group by
id
, value
pair: 首先我们应该按
id
, value
对分组:
df_grouped = df.reset_index().pivot_table(index=['id','value'],
values='name',
aggfunc=lambda x: ','.join(x)
).reset_index()
df_grouped['name'] = df_grouped['name'].str.split(',')
print(df_grouped)
id value name
0 1 0 a,b,d,e,f,g,h,i
1 1 1 c,j
2 2 0 l,m,n,o,p
3 2 1 k,q,r,s,t,u,w
Then select the zeros per value==0
and id
pair and keep the middle value: 然后选择每个
value==0
的零value==0
和id
对并保持中间值:
mask_zeros = ((df_grouped['value']==0)*
(df_grouped['name'].apply(len)>=5))
df_zeros = mask_zeros*df_grouped['name'].apply(
lambda x: x[int(np.ceil(.5*len(x)))]
if len(x)%2==1
else x[int(.5*len(x))])
print(df_zeros)
0 f
1
2 o
3
And select the first name per value==1
and id
pair: 并选择每个
value==1
的第一个名称value==1
和id
对:
mask_ones = (df_grouped['value']==1)
df_ones = mask_ones*df_grouped['name'].apply(
lambda x: x[0] if len(x)>0 else None)
print(df_ones)
0
1 c
2
3 k
Then keep only the selected names by assigning: 然后通过指定以下内容仅保留选定的名称:
df_grouped['name'] = df_ones + df_zeros
df_grouped = df_grouped.merge(df.reset_index(),
on=['name','value','id']
).set_index('index')
print(df_grouped)
id value name
index
5 1 0 f
2 1 1 c
14 2 0 o
10 2 1 k
I break down the steps 我打破了台阶
df['New']=df.value.diff().fillna(0).ne(0).cumsum()
df1=df.loc[df.value.eq(0)]
s1=df1.groupby(['id','New']).filter(lambda x : len(x)>=5 ).groupby('id').apply(lambda x : x.iloc[len(x)//2-1:len(x)//2+1] if len(x)%2==0 else x.iloc[[(len(x)+1)//2],:] ).reset_index(level=0,drop=True)
s2=df1.groupby(['id','New']).filter(lambda x : len(x)>=5 )
pd.concat([df.loc[s2.drop_duplicates(['id'],keep='last').index+1],s1]).sort_index()
Out[1995]:
id name value New
5 1 f 0 2
6 1 g 0 2
9 1 j 1 3
14 2 o 0 4
16 2 q 1 5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.