简体   繁体   中英

Python, Pandas: Groupby Threshold Value

I have a DataFrame such as follow:

在此处输入图片说明

I would like to use the GroupBy method in order to return the rows that are, for instance:
"All rows where 'gain_by_mae' > 1",
"All rows where 'entry_time' > 8:00 and 'entry_time' < 16:00 and 'gain_by_mae' > 1",
etc.

Is there anyway to do such sorting with the GroupBy method?

Here below a snippet to reconstruct the DataFrame:

import pandas as pd
from pandas import Timestamp
dikt={'direction': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1}, 'gain': {0: 1.0, 1: 1.0, 2: 0.75, 3: 0.75, 4: 1.25, 5: 0.5, 6: 0.75, 7: 0.5}, 'peak': {0: 1220.75, 1: 1220.75, 2: 1220.75, 3: 1220.75, 4: 1221.0, 5: 1221.0, 6: 1220.75, 7: 1221.5}, 'entry_time': {0: Timestamp('2005-03-08 20:00:00'), 1: Timestamp('2005-03-08 20:30:00'), 2: Timestamp('2005-03-08 21:00:00'), 3: Timestamp('2005-03-08 21:30:00'), 4: Timestamp('2005-03-08 22:00:00'), 5: Timestamp('2005-03-08 22:30:00'), 6: Timestamp('2005-03-08 23:00:00'), 7: Timestamp('2005-03-08 23:30:00')}, 'gain_by_mae': {0: 2.0, 1: 2.0, 2: 1.5, 3: 1.5, 4: 5.0, 5: 2.0, 6: inf, 7: inf}, 'trough': {0: 1220.25, 1: 1220.25, 2: 1220.25, 3: 1220.25, 4: 1220.75, 5: 1220.75, 6: 1220.75, 7: 1221.5}, 'exit_time': {0: Timestamp('2005-03-09 00:00:00'), 1: Timestamp('2005-03-09 00:00:00'), 2: Timestamp('2005-03-09 00:00:00'), 3: Timestamp('2005-03-09 00:00:00'), 4: Timestamp('2005-03-09 00:00:00'), 5: Timestamp('2005-03-09 00:00:00'), 6: Timestamp('2005-03-09 00:00:00'), 7: Timestamp('2005-03-09 00:00:00')}, 'trough_idx': {0: Timestamp('2005-03-08 21:30:00'), 1: Timestamp('2005-03-08 21:30:00'), 2: Timestamp('2005-03-08 21:30:00'), 3: Timestamp('2005-03-08 22:00:00'), 4: Timestamp('2005-03-08 23:00:00'), 5: Timestamp('2005-03-08 23:00:00'), 6: Timestamp('2005-03-08 23:30:00'), 7: Timestamp('2005-03-09 00:00:00')}, 'peak_idx': {0: Timestamp('2005-03-08 21:00:00'), 1: Timestamp('2005-03-08 21:00:00'), 2: Timestamp('2005-03-08 21:00:00'), 3: Timestamp('2005-03-08 21:30:00'), 4: Timestamp('2005-03-08 22:30:00'), 5: Timestamp('2005-03-08 22:30:00'), 6: Timestamp('2005-03-08 23:00:00'), 7: Timestamp('2005-03-09 00:00:00')}, 'exit_price': {0: 1221.5, 1: 1221.5, 2: 1221.5, 3: 1221.5, 4: 1221.5, 5: 1221.5, 6: 1221.5, 7: 1221.5}, 'mae': {0: 0.5, 1: 0.5, 2: 0.5, 3: 0.5, 4: 0.25, 5: 0.25, 6: 0.0, 7: 0.0}, 'entry_price': {0: 1220.5, 1: 1220.5, 2: 1220.75, 3: 1220.75, 4: 1220.25, 5: 1221.0, 6: 1220.75, 7: 1221.0}}
pd.DataFrame(dikt, columns=['entry_time', 'exit_time', 'entry_price', 'exit_price', 'direction', 'gain', 'peak', 'peak_idx', 'mae', 'trough_idx', 'trough', 'gain_by_mae'])

You don't need to use GroupBy to achieve what you're asking. Simple selection is sufficient:

df_filtered = df[df['gain_by_mae'] > 3]

You can also chain filters with boolean operators:

df_filtered = df[(df.gain_by_mae > 3) & (df.direction != 1)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM