简体   繁体   English

如何根据列值获取数据帧切片的最大值?

[英]How to get max of a slice of a dataframe based on column values?

I'm looking to make a new column, MaxPriceBetweenEntries based on the max() of a slice of the dataframe我希望根据数据帧切片的 max() 创建一个新列MaxPriceBetweenEntries

idx Price EntryBar ExitBar
0   10.00 0        1
1   11.00 NaN      NaN
2   10.15 2        4
3   12.14 NaN      NaN
4   10.30 NaN      NaN

turned into转换成

idx Price EntryBar ExitBar MaxPriceBetweenEntries
0   10.00 0        1       11.00
1   11.00 NaN      NaN     NaN
2   10.15 2        4       12.14
3   12.14 NaN      NaN     NaN
4   10.30 NaN      NaN     NaN

I can get all the rows with an EntryBar or ExitBar value with df.loc[df["EntryBar"].notnull()] and df.loc[df["ExitBar"].notnull()] , but I can't use that to set a new column:我可以使用df.loc[df["EntryBar"].notnull()]df.loc[df["ExitBar"].notnull()]获取带有 EntryBar 或 ExitBar 值的所有行,但我不能使用它来设置一个新列:

df.loc[df["EntryBar"].notnull(),"MaxPriceBetweenEntries"] = df.loc[df["EntryBar"]:df["ExitBar"]]["Price"].max()

but that's effectively a guess at this point, because nothing I'm trying works.但这实际上是一个猜测,因为我尝试的任何方法都不起作用。 Ideally the solution wouldn't involve a loop directly because there may be millions of rows.理想情况下,解决方案不会直接涉及循环,因为可能有数百万行。

You can groupby the cumulative sum of non-null entries and take the max, unsing np.where() to only apply to non-null rows::您可以按非空条目的累积总和进行np.where() ,并取最大值,unsing np.where()仅适用于非空行:

df['MaxPriceBetweenEntries'] = np.where(df['EntryBar'].notnull(),
                                        df.groupby(df['EntryBar'].notnull().cumsum())['Price'].transform('max'),
                                        np.nan)
df
Out[1]: 
   idx  Price  EntryBar  ExitBar  MaxPriceBetweenEntries
0    0  10.00       0.0      1.0                   11.00
1    1  11.00       NaN      NaN                     NaN
2    2  10.15       2.0      4.0                   12.14
3    3  12.14       NaN      NaN                     NaN
4    4  10.30       NaN      NaN                     NaN

Let's try groupby() and where :让我们试试groupby()where

s = df['EntryBar'].notna()
df['MaxPriceBetweenEntries'] = df.groupby(s.cumsum())['Price'].transform('max').where(s)

Output:输出:

   idx  Price  EntryBar  ExitBar  MaxPriceBetweenEntries
0    0  10.00       0.0      1.0                   11.00
1    1  11.00       NaN      NaN                     NaN
2    2  10.15       2.0      4.0                   12.14
3    3  12.14       NaN      NaN                     NaN
4    4  10.30       NaN      NaN                     NaN

You can forward fill the null values, group by entry and get the max of that groups Price.您可以向前填充空值,按条目分组并获得该组价格的最大值。 Use that as the right side of a left join and you should be in business.将其用作左连接的右侧,您应该可以开展业务。

df.merge(df.ffill().groupby('EntryBar')['Price'].max().reset_index(name='MaxPriceBetweenEntries'), 
                                                                   on='EntryBar', 
                                                                   how='left')

Try尝试

df.loc[df['ExitBar'].notna(),'Max']=df.groupby(df['ExitBar'].ffill()).Price.max().values
df
Out[74]: 
   idx  Price  EntryBar  ExitBar    Max
0    0  10.00       0.0      1.0  11.00
1    1  11.00       NaN      NaN    NaN
2    2  10.15       2.0      4.0  12.14
3    3  12.14       NaN      NaN    NaN
4    4  10.30       NaN      NaN    NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM