简体   繁体   English

找到最大值后,在单独的列中查找后续的最小值

[英]After finding max value, Find succeeding min value in separate column

In my dataframe, I have these columns. 在我的数据框中,我有这些列。

Date Time-(5 min buckets 7:00, 7:05, 7:10 etc....) High Low 日期时间-(5分钟时段7:00、7:05、7:10等...)高低

What I would like to do is to find the max in the 'High' column, THEN the min in the 'Low' column that proceeds it. 我想做的是在“高”列中找到最大值,然后在继续的“低”列中找到最小值。

Take that difference, so essentially High-Low, and pass that to a new column that is saying 采取这种差异,从本质上讲,从高到低,然后将其传递给新的专栏

"If the 'High'.max is in the 7:20 row and the low is in the 7:50 row what is the difference, and place that difference on the row next to 7:20" “如果'High'.max在7:20行中,而low在7:50行中,则有什么区别,并将该区别放在7:20旁边的行中”

At the end of all of this, I'd like to have the mean or median for all the 'High' - 'Low' differences by 'Time' 最后,我想按时间将所有“高”-“低”差异的均值或中位数

For Example (from large dataframe) 例如(来自大型数据框)

           Date   Time   Ticker     High      Low    Range
0      01/02/18   7:05  USD/JPY  112.170  112.150
1      01/02/18   7:10  USD/JPY  112.175  112.140
2      01/02/18   7:15  USD/JPY  112.185  112.170
3      01/02/18   7:20  USD/JPY  112.180  112.155   112.180-112.080 = .10
4      01/02/18   7:25  USD/JPY  112.160  112.145
5      01/02/18   7:30  USD/JPY  112.160  112.155
6      01/02/18   7:35  USD/JPY  112.160  112.120
7      01/02/18   7:40  USD/JPY  112.145  112.100
8      01/02/18   7:45  USD/JPY  112.120  112.085
9      01/02/18   7:50  USD/JPY  112.155  112.080
10     01/02/18   7:55  USD/JPY  112.150  112.130
32898  07/05/19  11:35  USD/JPY  108.545  108.525
32899  07/05/19  11:40  USD/JPY  108.550  108.535
32900  07/05/19  11:45  USD/JPY  108.560  108.530   108.560-108.525 = .035
32901  07/05/19  11:50  USD/JPY  108.550  108.540
32902  07/05/19  11:55  USD/JPY  108.535  108.525
32903  07/05/19  12:00  USD/JPY  108.550  108.530
32904  07/05/19  12:05  USD/JPY  108.555  108.530
32905  07/05/19  12:10  USD/JPY  108.560  108.540
32906  07/05/19  12:15  USD/JPY  108.560  108.540

Desired output 所需的输出

Time    Range (median or avg for all of the instances where the Max High was 7:20 ect)
7:20    .10
11:45   .035

Do I use a Lamdba to make sure I'm only finding the Low.min after finding the High.max for each day? 我是否使用Lamdba来确保仅在每天找到High.max之后才找到Low.min?

I know I can group by 'Date' and find the max for each date. 我知道我可以按“日期”分组并找到每个日期的最大值。

#High grouped by Date
df2 = df.loc[df.groupby('Date')['High'].idxmax()]

And I can find the range, but need the range AFTER finding the High.max then find the Low.min for each date, then by time. 我可以找到范围,但需要在找到High.max之后再为每个日期找到Low.min,然后再按时间找到范围。

#Difference between High and Low
range = (df['High']-df['Low'])

But I don't know how to find the min after finding the max and returning that difference to where the max time happened 但是我不知道在找到最大值并将该差异返回到最大时间发生后如何找到最小值

Like I already comment that the first max occurs at 7:15 , not 7:20 . 就像我已经评论到的,第一个最大值发生在7:15而不是7:20 Anyhow, here's my approach: 无论如何,这是我的方法:

new_df = df.groupby('Date').agg({'High': 'idxmax', 'Low':'min'})

# copy the time
new_df['Time'] = df.loc[new_df.High, 'Time'].values

# compute the range
new_df['Range'] = df.loc[new_df.High, 'High'].values - new_df.Low

new_df.drop(['High','Low'], axis=1)

gives: 给出:

           Time  Range
Date                  
01/02/18   7:15  0.105
07/05/19  11:45  0.035

To get the min after the max you can filter the rows within the groupby groups: 要获得最大值之后的最小值,您可以过滤groupby组中的行:

df.groupby('Date').apply(lambda x: x.High.max() - x[x.index > x.High.idxmax()].Low.min())

Result: 结果:

Date
01/02/18    0.105
07/05/19    0.035

In order to verify that this works correctly, you'll have to set eg the Low of the first row to 112.000, ie to make the day's absolute min occure before the max. 为了验证它是否正确运行,您必须将例如第一行的低点设置为112.000,即使当天的绝对最小值出现在最大值之前。


If you need the time info too, convert this to a dataframe and insert the time column: 如果您还需要时间信息,请将其转换为数据框并插入时间列:

res = df.groupby('Date').apply(lambda x: x.High.max() - x[x.index > x.High.idxmax()].Low.min()).to_frame('Range')
res.insert(0,'Time',df.loc[df.groupby('Date')['High'].idxmax(),'Time'].values)

Final result: 最后结果:

           Time  Range
Date                  
01/02/18   7:15  0.105
07/05/19  11:45  0.035


UPDATE 更新
If you'd rather like to insert the ranges as a new column in the original dataframe: 如果您想将范围作为新列插入原始数据框中:

df.loc[df.groupby('Date')['High'].idxmax().values,'Range']=df.groupby('Date').apply(lambda x: x.High.max() - x[x.index > x.High.idxmax()].Low.min()).values

Output: 输出:

           Date   Time   Ticker     High      Low  Range
0      01/02/18   7:05  USD/JPY  112.170  112.000    NaN
1      01/02/18   7:10  USD/JPY  112.175  112.140    NaN
2      01/02/18   7:15  USD/JPY  112.185  112.170  0.105
3      01/02/18   7:20  USD/JPY  112.180  112.155    NaN
4      01/02/18   7:25  USD/JPY  112.160  112.145    NaN
5      01/02/18   7:30  USD/JPY  112.160  112.155    NaN
6      01/02/18   7:35  USD/JPY  112.160  112.120    NaN
7      01/02/18   7:40  USD/JPY  112.145  112.100    NaN
8      01/02/18   7:45  USD/JPY  112.120  112.085    NaN
9      01/02/18   7:50  USD/JPY  112.155  112.080    NaN
10     01/02/18   7:55  USD/JPY  112.150  112.130    NaN
32898  07/05/19  11:35  USD/JPY  108.545  108.525    NaN
32899  07/05/19  11:40  USD/JPY  108.550  108.535    NaN
32900  07/05/19  11:45  USD/JPY  108.560  108.530  0.035
32901  07/05/19  11:50  USD/JPY  108.550  108.540    NaN
32902  07/05/19  11:55  USD/JPY  108.535  108.525    NaN
32903  07/05/19  12:00  USD/JPY  108.550  108.530    NaN
32904  07/05/19  12:05  USD/JPY  108.555  108.530    NaN
32905  07/05/19  12:10  USD/JPY  108.560  108.540    NaN
32906  07/05/19  12:15  USD/JPY  108.560  108.540    NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM