[英]How to get previous row with condition in a DataFrame of Pandas
[英]How to get previous rows of a pandas GroupedBy Dataframe based on a condition on the current row?
我有這樣的數據幀:
StringCol Timestamp GroupID Flag
xyz 20170101 123 yes
abc 20170101 123 yes
def 20170101 123 yes
ghi 20170101 123 no
abc 20170101 124 yes
jkl 20170101 124 yes
pqr 20170101 124 no
klm 20170101 124 yes
我想通過GroupID對此進行分組,對於每個組,我希望標記為“no”的行和之前的前一行的X個數(數據幀已按GroupID和Timestamp排序)。
所以,如果X = 2,我希望結果如下:
StringCol Timestamp GroupID Flag
abc 20170101 123 yes
def 20170101 123 yes
ghi 20170101 123 no
abc 20170101 124 yes
jkl 20170101 124 yes
pqr 20170101 124 no
我該如何實現這一目標? 謝謝。
這將獲得每組最后一個標志的前X項。
def prevK(x):
i = x.reset_index(drop=True).Flag.eq('no').iloc[::-1].idxmax()
return x.iloc[i - 2:i + 1, :]
df.groupby('GroupID', group_keys=False).apply(prevK)
StringCol Timestamp GroupID Flag
1 abc 20170101 123 yes
2 def 20170101 123 yes
3 ghi 20170101 123 no
4 abc 20170101 124 yes
5 jkl 20170101 124 yes
6 pqr 20170101 124 no
如果你只需要組中的最后一個,請嘗試drop_duplicates
df1=df.copy()
df=df[df['Flag'].eq('no')].drop_duplicates(['GroupID'],keep='last')
idx=df.index+1
idy=df.index-2
import itertools
df1.loc[list(itertools.chain(*[list(range(y,x)) for x , y in zip(idx,idy)]))]
Out[512]:
StringCol Timestamp GroupID Flag
1 abc 20170101 123 yes
2 def 20170101 123 yes
3 ghi 20170101 123 no
4 abc 20170101 124 yes
5 jkl 20170101 124 yes
6 pqr 20170101 124 no
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.