[英]How do I Find max value of a particular column between 2 dates based on a condition from another column in python
Can I get some help on how to Find max value of a particular column between 2 dates based on a condition from another column."我能否就如何根据另一列的条件在两个日期之间查找特定列的最大值获得一些帮助。”
I have a df like below and I would need to find the max value in the rows in-between where conditions are - max value of ['high'] column between the 2 'act' value in the ['mark'] column within the same ['symbol'] and store the value in a new column.我有一个像下面这样的 df,我需要在条件所在的行之间找到最大值 - ['mark'] 列中的 2'act' 值之间的 ['high'] 列的最大值相同的 ['symbol'] 并将值存储在新列中。
ie find max of high for APPLE between 04/03/2021 and 09/03/2021 as both these dates have "act" in the mark column.即在 2021 年 4 月 3 日至 2021 年 9 月 3 日之间找到 APPLE 的最大值,因为这两个日期在标记列中都有“行为”。 [ there are more 'act' marked in the column, but due to space constrain share a short version here ] [ 栏目中标注了更多的“行为”,但由于篇幅限制,在这里分享一个简短的版本]
similarly for orange between 04/03/2021 and 10/03/2021. 2021 年 4 月 3 日至 2021 年 3 月 10 日之间的橙色也是如此。
It should not do this calculation for the "act" marker for Apple on 09 as there is no more act for apple after that.它不应该在 09 上为 Apple 的“act”标记做这个计算,因为在那之后苹果没有更多的行为。
Data:数据:
date日期 | symbol象征 | open打开 | high高的 | low低的 | close关 | mark标记 |
---|---|---|---|---|---|---|
03/03/2021 2021 年 3 月 3 日 | APPLE苹果 | 732 732 | 754.95 754.95 | 723.4 723.4 | 729.85 729.85 | |
04/03/2021 2021 年 4 月 3 日 | APPLE苹果 | 733.25 733.25 | 765.7 765.7 | 715.85 715.85 | 752.45 752.45 | act行为 |
05/03/2021 2021 年 5 月 3 日 | APPLE苹果 | 752.45 752.45 | 761 761 | 730.5 730.5 | 748.95 748.95 | |
08/03/2021 2021 年 8 月 3 日 | APPLE苹果 | 762.7 762.7 | 767.8 767.8 | 744.2 744.2 | 748.4 748.4 | |
09/03/2021 2021 年 9 月 3 日 | APPLE苹果 | 755.55 755.55 | 759.4 759.4 | 738.65 738.65 | 750.75 750.75 | act行为 |
10/03/2021 2021 年 10 月 3 日 | APPLE苹果 | 757.5 757.5 | 753.1 753.1 | 743 743 | 745.35 745.35 | |
12/03/2021 2021 年 12 月 3 日 | APPLE苹果 | 743 743 | 752.1 752.1 | 723 723 | 728.15 728.15 | |
15/03/2021 15/03/2021 | APPLE苹果 | 727.8 727.8 | 727.8 727.8 | 706.05 706.05 | 719.05 719.05 | |
03/03/2021 2021 年 3 月 3 日 | ORANGE橙 | 2406 2406 | 2417.7 2417.7 | 2375.8 2375.8 | 2402.1 2402.1 | |
04/03/2021 2021 年 4 月 3 日 | ORANGE橙 | 2380 2380 | 2435 2435 | 2350 2350 | 2417.1 2417.1 | act行为 |
05/03/2021 2021 年 5 月 3 日 | ORANGE橙 | 2399 2399 | 2423.9 2423.9 | 2377.1 2377.1 | 2387.1 2387.1 | |
08/03/2021 2021 年 8 月 3 日 | ORANGE橙 | 2383 2383 | 2413.5 2413.5 | 2360.05 2360.05 | 2382.7 2382.7 | |
09/03/2021 2021 年 9 月 3 日 | ORANGE橙 | 2400 2400 | 2444 2444 | 2396.15 2396.15 | 2422.7 2422.7 | |
10/03/2021 2021 年 10 月 3 日 | ORANGE橙 | 2446 2446 | 2446 2446 | 2415.55 2415.55 | 2431.95 2431.95 | act行为 |
12/03/2021 2021 年 12 月 3 日 | ORANGE橙 | 2442.8 2442.8 | 2464.65 2464.65 | 2397 2397 | 2401.35 2401.35 | |
15/03/2021 15/03/2021 | ORANGE橙 | 2402.55 2402.55 | 2427.55 2427.55 | 2343.05 2343.05 | 2355 2355 |
OK, I've taken a crack at this - first I recreated the dataframe:好的,我已经对此有所了解 - 首先我重新创建了 dataframe:
import pandas as pd
data={("03/03/2021","APPLE",732,754.95,723.4,729.85,),
("04/03/2021","APPLE",733.25,765.7,715.85,752.45,"act"),
("05/03/2021","APPLE",752.45,761,730.5,748.95,),
("08/03/2021","APPLE",762.7,767.8,744.2,748.4,),
("09/03/2021","APPLE",755.55,759.4,738.65,750.75,"act"),
("10/03/2021","APPLE",757.5,753.1,743,745.35,),
("12/03/2021","APPLE",743,752.1,723,728.15,),
("15/03/2021","APPLE",727.8,727.8,706.05,719.05,),
("03/03/2021","ORANGE",2406,2417.7,2375.8,2402.1,),
("04/03/2021","ORANGE",2380,2435,2350,2417.1,"act"),
("05/03/2021","ORANGE",2399,2423.9,2377.1,2387.1,),
("08/03/2021","ORANGE",2383,2413.5,2360.05,2382.7,),
("09/03/2021","ORANGE",2400,2444,2396.15,2422.7,),
("10/03/2021","ORANGE",2446,2446,2415.55,2431.95,"act"),
("12/03/2021","ORANGE",2442.8,2464.65,2397,2401.35,),
("15/03/2021","ORANGE",2402.55,2427.55,2343.05,2355,)}
df = pd.DataFrame(data,
columns=("date","symbol","open","high","low","close","mark")).
sort_values(by=["symbol", "date"]).fillna("").reset_index(drop=True)
I figure that what you want to do is a simple max
on group-by.我认为您想要做的是对 group-by 的简单max
。 The tricky part is manipulating your data so it conforms with what group-by expects.棘手的部分是操纵您的数据,使其符合 group-by 的预期。 That is, a field on which to group.也就是说,要分组的字段。
def block_diff(series, trigger, start_stop=False):
toggle = False
rs = list()
for i,v in series.iteritems():
if v==trigger:
if start_stop and toggle:
rs.append(toggle)
toggle=not toggle
elif start_stop and not toggle:
toggle=not toggle
rs.append(toggle)
elif not start_stop:
toggle=not toggle
rs.append(toggle)
else:
rs.append(toggle)
return pd.Series(rs)
So the above function is defined - the idea here is that we want to block-out the regions that are going to feature in the group-by.所以上面的 function 被定义了——这里的想法是我们想要屏蔽那些将在 group-by 中出现的区域。 This function accepts a series, some matching trigger value, and a start_stop flag to fine-tune behavior.这个 function 接受一个系列、一些匹配的触发值和一个 start_stop 标志来微调行为。
If I apply that to the dataframe, using the returned True/False
values as an index to populate a copy of the grouping variable and store the results in a new field called act_block
then I create a unique grouping field that also functions as a start-stop filter.如果我将其应用于 dataframe,使用返回的True/False
值作为索引来填充分组变量的副本并将结果存储在一个名为act_block
的新字段中,然后我创建一个唯一的分组字段,该字段也用作开始-停止过滤。 At the same time, I also create an additional column called act_sequence
which we'll use later to identify the initial starting row for each sub-group.同时,我还创建了一个名为act_sequence
的附加列,稍后我们将使用它来标识每个子组的初始起始行。
df['act_block'] = df[block_diff(df['mark'], "act", True)]['symbol']
df['act_sequence'] = df.groupby("act_block").cumcount()
df
date symbol open high low close mark act_block act_sequence
0 03/03/2021 APPLE 732.00 754.95 723.40 729.85 NaN 0
1 04/03/2021 APPLE 733.25 765.70 715.85 752.45 act APPLE 0
2 05/03/2021 APPLE 752.45 761.00 730.50 748.95 APPLE 1
3 08/03/2021 APPLE 762.70 767.80 744.20 748.40 APPLE 2
4 09/03/2021 APPLE 755.55 759.40 738.65 750.75 act APPLE 3
5 10/03/2021 APPLE 757.50 753.10 743.00 745.35 NaN 1
6 12/03/2021 APPLE 743.00 752.10 723.00 728.15 NaN 2
7 15/03/2021 APPLE 727.80 727.80 706.05 719.05 NaN 3
8 03/03/2021 ORANGE 2406.00 2417.70 2375.80 2402.10 NaN 4
9 04/03/2021 ORANGE 2380.00 2435.00 2350.00 2417.10 act ORANGE 0
10 05/03/2021 ORANGE 2399.00 2423.90 2377.10 2387.10 ORANGE 1
11 08/03/2021 ORANGE 2383.00 2413.50 2360.05 2382.70 ORANGE 2
12 09/03/2021 ORANGE 2400.00 2444.00 2396.15 2422.70 ORANGE 3
13 10/03/2021 ORANGE 2446.00 2446.00 2415.55 2431.95 act ORANGE 4
14 12/03/2021 ORANGE 2442.80 2464.65 2397.00 2401.35 NaN 5
15 15/03/2021 ORANGE 2402.55 2427.55 2343.05 2355.00 NaN 6
Now we can do a simple groupby on act_block
, saving the results into a series called max_groups:现在我们可以在act_block
上做一个简单的 groupby,将结果保存到一个名为 max_groups 的系列中:
max_groups = df.groupby("act_block")["high"].max()
act_block
APPLE 767.8
ORANGE 2446.0
Name: high, dtype: float64
Take this series and merge it with the original dataframe - if we do this with a filter, the max_vals
object will inherit the original dataframe's index, allowing us to do a pd.concat
to selectively join the two objects together to produce the intended output. Take this series and merge it with the original dataframe - if we do this with a filter, the max_vals
object will inherit the original dataframe's index, allowing us to do a pd.concat
to selectively join the two objects together to produce the intended output.
max_vals = df.merge(max_groups, left_on=["act_block"], right_on="act_block",how="left")[(df['act_sequence']==0)].fillna("")['high_y']
max_vals.name="max_val"
new_df = pd.concat([df, max_vals], axis=1).fillna("")
new_df = new_df[["date","symbol","open","high","low","close","mark","max_val"]]
new_df
date日期 | symbol象征 | open打开 | high高的 | low低的 | close关 | mark标记 | max_val max_val | |
---|---|---|---|---|---|---|---|---|
0 0 | 03/03/2021 2021 年 3 月 3 日 | APPLE苹果 | 732.00 732.00 | 754.95 754.95 | 723.40 723.40 | 729.85 729.85 | ||
1 1 | 04/03/2021 2021 年 4 月 3 日 | APPLE苹果 | 733.25 733.25 | 765.70 765.70 | 715.85 715.85 | 752.45 752.45 | act行为 | 767.8 767.8 |
2 2 | 05/03/2021 2021 年 5 月 3 日 | APPLE苹果 | 752.45 752.45 | 761.00 761.00 | 730.50 730.50 | 748.95 748.95 | ||
3 3 | 08/03/2021 2021 年 8 月 3 日 | APPLE苹果 | 762.70 762.70 | 767.80 767.80 | 744.20 744.20 | 748.40 748.40 | ||
4 4 | 09/03/2021 2021 年 9 月 3 日 | APPLE苹果 | 755.55 755.55 | 759.40 759.40 | 738.65 738.65 | 750.75 750.75 | act行为 | |
5 5 | 10/03/2021 2021 年 10 月 3 日 | APPLE苹果 | 757.50 757.50 | 753.10 753.10 | 743.00 743.00 | 745.35 745.35 | ||
6 6 | 12/03/2021 2021 年 12 月 3 日 | APPLE苹果 | 743.00 743.00 | 752.10 752.10 | 723.00 723.00 | 728.15 728.15 | ||
7 7 | 15/03/2021 15/03/2021 | APPLE苹果 | 727.80 727.80 | 727.80 727.80 | 706.05 706.05 | 719.05 719.05 | ||
8 8 | 03/03/2021 2021 年 3 月 3 日 | ORANGE橙 | 2406.00 2406.00 | 2417.70 2417.70 | 2375.80 2375.80 | 2402.10 2402.10 | ||
9 9 | 04/03/2021 2021 年 4 月 3 日 | ORANGE橙 | 2380.00 2380.00 | 2435.00 2435.00 | 2350.00 2350.00 | 2417.10 2417.10 | act行为 | 2446 2446 |
10 10 | 05/03/2021 2021 年 5 月 3 日 | ORANGE橙 | 2399.00 2399.00 | 2423.90 2423.90 | 2377.10 2377.10 | 2387.10 2387.10 | ||
11 11 | 08/03/2021 2021 年 8 月 3 日 | ORANGE橙 | 2383.00 2383.00 | 2413.50 2413.50 | 2360.05 2360.05 | 2382.70 2382.70 | ||
12 12 | 09/03/2021 2021 年 9 月 3 日 | ORANGE橙 | 2400.00 2400.00 | 2444.00 2444.00 | 2396.15 2396.15 | 2422.70 2422.70 | ||
13 13 | 10/03/2021 2021 年 10 月 3 日 | ORANGE橙 | 2446.00 2446.00 | 2446.00 2446.00 | 2415.55 2415.55 | 2431.95 2431.95 | act行为 | |
14 14 | 12/03/2021 2021 年 12 月 3 日 | ORANGE橙 | 2442.80 2442.80 | 2464.65 2464.65 | 2397.00 2397.00 | 2401.35 2401.35 | ||
15 15 | 15/03/2021 15/03/2021 | ORANGE橙 | 2402.55 2402.55 | 2427.55 2427.55 | 2343.05 2343.05 | 2355.00 2355.00 |
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.