[英]Select all rows from where a condition is true in pandas
I have a dataframe 我有一个数据帧
Id Seqno. Event
1 2 A
1 3 B
1 5 A
1 6 A
1 7 D
2 0 E
2 1 A
2 2 B
2 4 A
2 6 B
I want to get all the events happened since the count of recent occurrence of Pattern A = 2 for each ID. 我希望自从最近出现的每个ID的模式A = 2的计数以来发生了所有事件。 Seqno. SEQNO。 is a sequence number for each ID. 是每个ID的序列号。 The output will be 输出将是
Id Seqno. Event
1 5 A
1 6 A
1 7 D
2 1 A
2 2 B
2 4 A
2 6 B
so far i tried, 到目前为止我试过,
y=x.groupby('Id').apply( lambda
x:x.eventtype.eq('A').cumsum().tail(2)).reset_index()
p=y.groupby('Id').apply(lambda x:
x.iloc[0]).reset_index(drop=True)
q= x.reset_index()
s= pd.merge(q,p,on='Id')
dd= s[s['index']>=s['level_1']]
I was wondering if there is a good way of doing it. 我想知道是否有一个很好的方法。
Use groupby
with cumsum
, subtract it from the count of A's per group, and filter: 将groupby
与cumsum
一起cumsum
,从每组的A计数中减去它,并过滤:
g = df['Event'].eq('A').groupby(df['Id'])
df[(g.transform('sum') - g.cumsum()).le(1)]
Id Seqno. Event
2 1 5 A
3 1 6 A
4 1 7 D
6 2 1 A
7 2 2 B
8 2 4 A
9 2 6 B
Thanks to cold ,ALollz and Vaishali, via the explanation (from the comment) using groupby
with cumcount
get the count , then we using reindex
and ffill
感谢冷,ALollz和Vaishali,通过使用groupby
和cumcount
的解释(来自评论)得到计数,然后我们使用reindex
和ffill
s=df.loc[df.Event=='A'].groupby('Id').cumcount(ascending=False).add(1).reindex(df.index)
s.groupby(df['Id']).ffill()
Out[57]:
0 3.0
1 3.0
2 2.0
3 1.0
4 1.0
5 NaN
6 2.0
7 2.0
8 1.0
9 1.0
dtype: float64
yourdf=df[s.groupby(df['Id']).ffill()<=2]
yourdf
Out[58]:
Id Seqno. Event
2 1 5 A
3 1 6 A
4 1 7 D
6 2 1 A
7 2 2 B
8 2 4 A
9 2 6 B
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.