[英]After finding max value, Find succeeding min value in separate column
In my dataframe, I have these columns. 在我的数据框中,我有这些列。
Date Time-(5 min buckets 7:00, 7:05, 7:10 etc....) High Low 日期时间-(5分钟时段7:00、7:05、7:10等...)高低
What I would like to do is to find the max in the 'High' column, THEN the min in the 'Low' column that proceeds it. 我想做的是在“高”列中找到最大值,然后在继续的“低”列中找到最小值。
Take that difference, so essentially High-Low, and pass that to a new column that is saying 采取这种差异,从本质上讲,从高到低,然后将其传递给新的专栏
"If the 'High'.max is in the 7:20 row and the low is in the 7:50 row what is the difference, and place that difference on the row next to 7:20" “如果'High'.max在7:20行中,而low在7:50行中,则有什么区别,并将该区别放在7:20旁边的行中”
At the end of all of this, I'd like to have the mean or median for all the 'High' - 'Low' differences by 'Time' 最后,我想按时间将所有“高”-“低”差异的均值或中位数
For Example (from large dataframe) 例如(来自大型数据框)
Date Time Ticker High Low Range
0 01/02/18 7:05 USD/JPY 112.170 112.150
1 01/02/18 7:10 USD/JPY 112.175 112.140
2 01/02/18 7:15 USD/JPY 112.185 112.170
3 01/02/18 7:20 USD/JPY 112.180 112.155 112.180-112.080 = .10
4 01/02/18 7:25 USD/JPY 112.160 112.145
5 01/02/18 7:30 USD/JPY 112.160 112.155
6 01/02/18 7:35 USD/JPY 112.160 112.120
7 01/02/18 7:40 USD/JPY 112.145 112.100
8 01/02/18 7:45 USD/JPY 112.120 112.085
9 01/02/18 7:50 USD/JPY 112.155 112.080
10 01/02/18 7:55 USD/JPY 112.150 112.130
32898 07/05/19 11:35 USD/JPY 108.545 108.525
32899 07/05/19 11:40 USD/JPY 108.550 108.535
32900 07/05/19 11:45 USD/JPY 108.560 108.530 108.560-108.525 = .035
32901 07/05/19 11:50 USD/JPY 108.550 108.540
32902 07/05/19 11:55 USD/JPY 108.535 108.525
32903 07/05/19 12:00 USD/JPY 108.550 108.530
32904 07/05/19 12:05 USD/JPY 108.555 108.530
32905 07/05/19 12:10 USD/JPY 108.560 108.540
32906 07/05/19 12:15 USD/JPY 108.560 108.540
Desired output 所需的输出
Time Range (median or avg for all of the instances where the Max High was 7:20 ect)
7:20 .10
11:45 .035
Do I use a Lamdba to make sure I'm only finding the Low.min after finding the High.max for each day? 我是否使用Lamdba来确保仅在每天找到High.max之后才找到Low.min?
I know I can group by 'Date' and find the max for each date. 我知道我可以按“日期”分组并找到每个日期的最大值。
#High grouped by Date
df2 = df.loc[df.groupby('Date')['High'].idxmax()]
And I can find the range, but need the range AFTER finding the High.max then find the Low.min for each date, then by time. 我可以找到范围,但需要在找到High.max之后再为每个日期找到Low.min,然后再按时间找到范围。
#Difference between High and Low
range = (df['High']-df['Low'])
But I don't know how to find the min after finding the max and returning that difference to where the max time happened 但是我不知道在找到最大值并将该差异返回到最大时间发生后如何找到最小值
Like I already comment that the first max occurs at 7:15
, not 7:20
. 就像我已经评论到的,第一个最大值发生在7:15
而不是7:20
。 Anyhow, here's my approach: 无论如何,这是我的方法:
new_df = df.groupby('Date').agg({'High': 'idxmax', 'Low':'min'})
# copy the time
new_df['Time'] = df.loc[new_df.High, 'Time'].values
# compute the range
new_df['Range'] = df.loc[new_df.High, 'High'].values - new_df.Low
new_df.drop(['High','Low'], axis=1)
gives: 给出:
Time Range
Date
01/02/18 7:15 0.105
07/05/19 11:45 0.035
To get the min after the max you can filter the rows within the groupby groups: 要获得最大值之后的最小值,您可以过滤groupby组中的行:
df.groupby('Date').apply(lambda x: x.High.max() - x[x.index > x.High.idxmax()].Low.min())
Result: 结果:
Date
01/02/18 0.105
07/05/19 0.035
In order to verify that this works correctly, you'll have to set eg the Low of the first row to 112.000, ie to make the day's absolute min occure before the max. 为了验证它是否正确运行,您必须将例如第一行的低点设置为112.000,即使当天的绝对最小值出现在最大值之前。
If you need the time info too, convert this to a dataframe and insert the time column: 如果您还需要时间信息,请将其转换为数据框并插入时间列:
res = df.groupby('Date').apply(lambda x: x.High.max() - x[x.index > x.High.idxmax()].Low.min()).to_frame('Range')
res.insert(0,'Time',df.loc[df.groupby('Date')['High'].idxmax(),'Time'].values)
Final result: 最后结果:
Time Range
Date
01/02/18 7:15 0.105
07/05/19 11:45 0.035
UPDATE 更新
If you'd rather like to insert the ranges as a new column in the original dataframe: 如果您想将范围作为新列插入原始数据框中:
df.loc[df.groupby('Date')['High'].idxmax().values,'Range']=df.groupby('Date').apply(lambda x: x.High.max() - x[x.index > x.High.idxmax()].Low.min()).values
Output: 输出:
Date Time Ticker High Low Range
0 01/02/18 7:05 USD/JPY 112.170 112.000 NaN
1 01/02/18 7:10 USD/JPY 112.175 112.140 NaN
2 01/02/18 7:15 USD/JPY 112.185 112.170 0.105
3 01/02/18 7:20 USD/JPY 112.180 112.155 NaN
4 01/02/18 7:25 USD/JPY 112.160 112.145 NaN
5 01/02/18 7:30 USD/JPY 112.160 112.155 NaN
6 01/02/18 7:35 USD/JPY 112.160 112.120 NaN
7 01/02/18 7:40 USD/JPY 112.145 112.100 NaN
8 01/02/18 7:45 USD/JPY 112.120 112.085 NaN
9 01/02/18 7:50 USD/JPY 112.155 112.080 NaN
10 01/02/18 7:55 USD/JPY 112.150 112.130 NaN
32898 07/05/19 11:35 USD/JPY 108.545 108.525 NaN
32899 07/05/19 11:40 USD/JPY 108.550 108.535 NaN
32900 07/05/19 11:45 USD/JPY 108.560 108.530 0.035
32901 07/05/19 11:50 USD/JPY 108.550 108.540 NaN
32902 07/05/19 11:55 USD/JPY 108.535 108.525 NaN
32903 07/05/19 12:00 USD/JPY 108.550 108.530 NaN
32904 07/05/19 12:05 USD/JPY 108.555 108.530 NaN
32905 07/05/19 12:10 USD/JPY 108.560 108.540 NaN
32906 07/05/19 12:15 USD/JPY 108.560 108.540 NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.