选择pandas中条件为真的所有行

Question

I have a dataframe 我有一个数据帧

 Id  Seqno. Event
 1     2    A 
 1     3    B 
 1     5    A 
 1     6    A 
 1     7    D
 2     0    E
 2     1    A 
 2     2    B 
 2     4    A 
 2     6    B

I want to get all the events happened since the count of recent occurrence of Pattern A = 2 for each ID. 我希望自从最近出现的每个ID的模式A = 2的计数以来发生了所有事件。 Seqno. SEQNO。 is a sequence number for each ID. 是每个ID的序列号。 The output will be 输出将是

 Id  Seqno. Event 
 1     5    A 
 1     6    A 
 1     7    D
 2     1    A 
 2     2    B 
 2     4    A 
 2     6    B

so far i tried, 到目前为止我试过，

  y=x.groupby('Id').apply( lambda 
  x:x.eventtype.eq('A').cumsum().tail(2)).reset_index()
  p=y.groupby('Id').apply(lambda x:       
  x.iloc[0]).reset_index(drop=True)
  q= x.reset_index()
  s= pd.merge(q,p,on='Id')
  dd= s[s['index']>=s['level_1']]

I was wondering if there is a good way of doing it. 我想知道是否有一个很好的方法。

Answer 1

Use groupby with cumsum , subtract it from the count of A's per group, and filter: 将groupby与cumsum一起cumsum ，从每组的A计数中减去它，并过滤：

g = df['Event'].eq('A').groupby(df['Id'])
df[(g.transform('sum') - g.cumsum()).le(1)]

   Id  Seqno. Event
2   1       5     A
3   1       6     A
4   1       7     D
6   2       1     A
7   2       2     B
8   2       4     A
9   2       6     B

Answer 2

Thanks to cold ,ALollz and Vaishali, via the explanation (from the comment) using groupby with cumcount get the count , then we using reindex and ffill 感谢冷，ALollz和Vaishali，通过使用groupby和cumcount的解释（来自评论）得到计数，然后我们使用reindex和ffill

s=df.loc[df.Event=='A'].groupby('Id').cumcount(ascending=False).add(1).reindex(df.index)
s.groupby(df['Id']).ffill()
Out[57]: 
0    3.0
1    3.0
2    2.0
3    1.0
4    1.0
5    NaN
6    2.0
7    2.0
8    1.0
9    1.0
dtype: float64
yourdf=df[s.groupby(df['Id']).ffill()<=2]
yourdf
Out[58]: 
   Id  Seqno. Event
2   1       5     A
3   1       6     A
4   1       7     D
6   2       1     A
7   2       2     B
8   2       4     A
9   2       6     B

选择pandas中条件为真的所有行

问题描述

2 个解决方案

解决方案1
3 已采纳 2019-01-22 23:41:28

解决方案2
2 2019-01-22 21:59:34

选择pandas中条件为真的所有行

问题描述

2 个解决方案

解决方案1 3 已采纳 2019-01-22 23:41:28

解决方案2 2 2019-01-22 21:59:34

解决方案1
3 已采纳 2019-01-22 23:41:28

解决方案2
2 2019-01-22 21:59:34