I have a result set from which I want to get next n rows (or previous n rows) after (before) the row that matches a particular cell value.
So for example, here is my data:
A B C
1 10 2018-11-01
2 20 2018-10-31
3 30 2018-10-30
4 40 2018-10-29
5 50 2018-10-28
6 60 2018-10-27
I am interested to get 3 rows before the row where C=2018-10-28 (date type), including C=2018-10-28 row, so my output should be
A B C
3 30 2018-10-30
4 40 2018-10-29
5 50 2018-10-28
I tried loc but it needs index and so this raises error : df2 = df2.loc[:C].tail(3)
as TypeError: can't compare datetime.date to int
.
Check dtypes
in df
: if df.dtypes
of C
columns is not datetime, convert it into datetime:
df.dtypes
Out[46]:
B int64
C object
dtype: object
df['C'] = pd.to_datetime(df['C'])
df.dtypes
Out[48]:
B int64
C datetime64[ns]
dtype: object
Now 'C' columns is comparable with datetime-formatted strings:
target_date = "2018-10-28"
df[df['C'] >= target_date].tail(3)
B C
A
3 30 2018-10-30
4 40 2018-10-29
5 50 2018-10-28
But in more general case (there are more than one target column and data is unordered) you could use a following approach:
df
A B C
0 10 2018-09-10
1 20 2018-07-11
2 20 2018-06-12
3 30 2018-07-13
4 50 2018-10-28
5 10 2018-11-01
6 20 2018-10-31
7 30 2018-10-30
8 40 2018-10-29
9 50 2018-10-28
10 60 2018-10-27
index = df[df['C'] == '2018-10-28'].index
index
Out:
Int64Index([4, 9], dtype='int64', name=0)
Use slice
and .iloc
to fetch targets:
slices = [slice(i, i-3, -1) for i in indicies]
slices
Out: [slice(4, 1, -1), slice(9, 6, -1)]
pd.concat([df.iloc[sl] for sl in slices])
B C
A
4 50 2018-10-28
3 30 2018-07-13
2 20 2018-06-12
9 50 2018-10-28
8 40 2018-10-29
7 30 2018-10-30
Resulting frame is not sorted, but it'll be easy to fix. This approach works only for a numerical index, but in it's absence you can just add it with pd.reset_index()
.
I am interested to get 3 rows before the row where C = 2018-10-28
First find the index via pd.Series.idxmax
, then slice using pd.DataFrame.iloc
, which supports integer positional indexing:
idx = df['C'].eq('2018-10-28').idxmax()
res = df.iloc[idx-2: idx+1]
print(res)
# A B C
# 2 3 30 2018-10-30
# 3 4 40 2018-10-29
# 4 5 50 2018-10-28
you can use something
s = StringIO("""
A B C
1 10 2018-11-01
2 20 2018-10-31
3 30 2018-10-30
4 40 2018-10-29
5 50 2018-10-28
6 60 2018-10-27""")
final = pd.read_csv(s, sep='\s\s+', engine='python')
final['C] = pd.to_datetime(final['C])
final
A B C
0 1 10 2018-11-01
1 2 20 2018-10-31
2 3 30 2018-10-30
3 4 40 2018-10-29
4 5 50 2018-10-28
5 6 60 2018-10-27
final.loc[final[final['C'] == '2018-10-28'].index[0]-2:final[final['C'] == '2018-10-28' ].index[0]]
Output
A B C
2 3 30 2018-10-30
3 4 40 2018-10-29
4 5 50 2018-10-28
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.