[英]How to get max of a slice of a dataframe based on column values?
I'm looking to make a new column, MaxPriceBetweenEntries
based on the max() of a slice of the dataframe我希望根据数据帧切片的 max() 创建一个新列MaxPriceBetweenEntries
idx Price EntryBar ExitBar
0 10.00 0 1
1 11.00 NaN NaN
2 10.15 2 4
3 12.14 NaN NaN
4 10.30 NaN NaN
turned into转换成
idx Price EntryBar ExitBar MaxPriceBetweenEntries
0 10.00 0 1 11.00
1 11.00 NaN NaN NaN
2 10.15 2 4 12.14
3 12.14 NaN NaN NaN
4 10.30 NaN NaN NaN
I can get all the rows with an EntryBar or ExitBar value with df.loc[df["EntryBar"].notnull()]
and df.loc[df["ExitBar"].notnull()]
, but I can't use that to set a new column:我可以使用df.loc[df["EntryBar"].notnull()]
和df.loc[df["ExitBar"].notnull()]
获取带有 EntryBar 或 ExitBar 值的所有行,但我不能使用它来设置一个新列:
df.loc[df["EntryBar"].notnull(),"MaxPriceBetweenEntries"] = df.loc[df["EntryBar"]:df["ExitBar"]]["Price"].max()
but that's effectively a guess at this point, because nothing I'm trying works.但这实际上是一个猜测,因为我尝试的任何方法都不起作用。 Ideally the solution wouldn't involve a loop directly because there may be millions of rows.理想情况下,解决方案不会直接涉及循环,因为可能有数百万行。
You can groupby the cumulative sum of non-null entries and take the max, unsing np.where()
to only apply to non-null rows::您可以按非空条目的累积总和进行np.where()
,并取最大值,unsing np.where()
仅适用于非空行:
df['MaxPriceBetweenEntries'] = np.where(df['EntryBar'].notnull(),
df.groupby(df['EntryBar'].notnull().cumsum())['Price'].transform('max'),
np.nan)
df
Out[1]:
idx Price EntryBar ExitBar MaxPriceBetweenEntries
0 0 10.00 0.0 1.0 11.00
1 1 11.00 NaN NaN NaN
2 2 10.15 2.0 4.0 12.14
3 3 12.14 NaN NaN NaN
4 4 10.30 NaN NaN NaN
Let's try groupby()
and where
:让我们试试groupby()
和where
:
s = df['EntryBar'].notna()
df['MaxPriceBetweenEntries'] = df.groupby(s.cumsum())['Price'].transform('max').where(s)
Output:输出:
idx Price EntryBar ExitBar MaxPriceBetweenEntries
0 0 10.00 0.0 1.0 11.00
1 1 11.00 NaN NaN NaN
2 2 10.15 2.0 4.0 12.14
3 3 12.14 NaN NaN NaN
4 4 10.30 NaN NaN NaN
You can forward fill the null values, group by entry and get the max of that groups Price.您可以向前填充空值,按条目分组并获得该组价格的最大值。 Use that as the right side of a left join and you should be in business.将其用作左连接的右侧,您应该可以开展业务。
df.merge(df.ffill().groupby('EntryBar')['Price'].max().reset_index(name='MaxPriceBetweenEntries'),
on='EntryBar',
how='left')
Try尝试
df.loc[df['ExitBar'].notna(),'Max']=df.groupby(df['ExitBar'].ffill()).Price.max().values
df
Out[74]:
idx Price EntryBar ExitBar Max
0 0 10.00 0.0 1.0 11.00
1 1 11.00 NaN NaN NaN
2 2 10.15 2.0 4.0 12.14
3 3 12.14 NaN NaN NaN
4 4 10.30 NaN NaN NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.