I have a dataframe like this:
StringCol Timestamp GroupID Flag
xyz 20170101 123 yes
abc 20170101 123 yes
def 20170101 123 yes
ghi 20170101 123 no
abc 20170101 124 yes
jkl 20170101 124 yes
pqr 20170101 124 no
klm 20170101 124 yes
I want to group this by the GroupID, and for each group, I want the rows that have flag as "no" and X number of previous rows before it (the dataframe is sorted by GroupID and Timestamp already).
So, if X = 2, I want the result to be something like:
StringCol Timestamp GroupID Flag
abc 20170101 123 yes
def 20170101 123 yes
ghi 20170101 123 no
abc 20170101 124 yes
jkl 20170101 124 yes
pqr 20170101 124 no
How do I achieve this? Thanks.
This gets the previous X items for the last flag per group.
def prevK(x):
i = x.reset_index(drop=True).Flag.eq('no').iloc[::-1].idxmax()
return x.iloc[i - 2:i + 1, :]
df.groupby('GroupID', group_keys=False).apply(prevK)
StringCol Timestamp GroupID Flag
1 abc 20170101 123 yes
2 def 20170101 123 yes
3 ghi 20170101 123 no
4 abc 20170101 124 yes
5 jkl 20170101 124 yes
6 pqr 20170101 124 no
If you only need last no in the group try drop_duplicates
df1=df.copy()
df=df[df['Flag'].eq('no')].drop_duplicates(['GroupID'],keep='last')
idx=df.index+1
idy=df.index-2
import itertools
df1.loc[list(itertools.chain(*[list(range(y,x)) for x , y in zip(idx,idy)]))]
Out[512]:
StringCol Timestamp GroupID Flag
1 abc 20170101 123 yes
2 def 20170101 123 yes
3 ghi 20170101 123 no
4 abc 20170101 124 yes
5 jkl 20170101 124 yes
6 pqr 20170101 124 no
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.