如何基于当前行的条件获取Pandas GroupedBy Dataframe的前一行？

Question

我有这样的数据帧：

StringCol Timestamp GroupID Flag
   xyz    20170101   123     yes
   abc    20170101   123     yes
   def    20170101   123     yes
   ghi    20170101   123     no
   abc    20170101   124     yes
   jkl    20170101   124     yes
   pqr    20170101   124     no
   klm    20170101   124     yes

我想通过GroupID对此进行分组，对于每个组，我希望标记为“no”的行和之前的前一行的X个数（数据帧已按GroupID和Timestamp排序）。

所以，如果X = 2，我希望结果如下：

StringCol Timestamp GroupID Flag
   abc    20170101   123     yes
   def    20170101   123     yes
   ghi    20170101   123     no
   abc    20170101   124     yes
   jkl    20170101   124     yes
   pqr    20170101   124     no

我该如何实现这一目标？ 谢谢。

Answer 1

这将获得每组最后一个标志的前X项。

def prevK(x):
    i = x.reset_index(drop=True).Flag.eq('no').iloc[::-1].idxmax()
    return x.iloc[i - 2:i + 1, :]

df.groupby('GroupID', group_keys=False).apply(prevK)

  StringCol  Timestamp  GroupID Flag
1       abc   20170101      123  yes
2       def   20170101      123  yes
3       ghi   20170101      123   no
4       abc   20170101      124  yes
5       jkl   20170101      124  yes
6       pqr   20170101      124   no

Answer 2

如果你只需要组中的最后一个，请尝试drop_duplicates

df1=df.copy()
df=df[df['Flag'].eq('no')].drop_duplicates(['GroupID'],keep='last')

idx=df.index+1
idy=df.index-2
import itertools
df1.loc[list(itertools.chain(*[list(range(y,x)) for x , y in  zip(idx,idy)]))]
Out[512]: 
  StringCol  Timestamp  GroupID Flag
1       abc   20170101      123  yes
2       def   20170101      123  yes
3       ghi   20170101      123   no
4       abc   20170101      124  yes
5       jkl   20170101      124  yes
6       pqr   20170101      124   no

如何基于当前行的条件获取Pandas GroupedBy Dataframe的前一行？

问题描述

2 个解决方案

解决方案1
2 已采纳 2018-08-21 23:57:17

解决方案2
1 2018-08-21 23:56:01

如何基于当前行的条件获取Pandas GroupedBy Dataframe的前一行？

问题描述

2 个解决方案

解决方案1 2 已采纳 2018-08-21 23:57:17

解决方案2 1 2018-08-21 23:56:01

解决方案1
2 已采纳 2018-08-21 23:57:17

解决方案2
1 2018-08-21 23:56:01