如何根据 python 中另一列的条件查找两个日期之间特定列的最大值

Question

Can I get some help on how to Find max value of a particular column between 2 dates based on a condition from another column."我能否就如何根据另一列的条件在两个日期之间查找特定列的最大值获得一些帮助。”

I have a df like below and I would need to find the max value in the rows in-between where conditions are - max value of ['high'] column between the 2 'act' value in the ['mark'] column within the same ['symbol'] and store the value in a new column.我有一个像下面这样的 df，我需要在条件所在的行之间找到最大值 - ['mark'] 列中的 2'act' 值之间的 ['high'] 列的最大值相同的 ['symbol'] 并将值存储在新列中。

ie find max of high for APPLE between 04/03/2021 and 09/03/2021 as both these dates have "act" in the mark column.即在 2021 年 4 月 3 日至 2021 年 9 月 3 日之间找到 APPLE 的最大值，因为这两个日期在标记列中都有“行为”。 [ there are more 'act' marked in the column, but due to space constrain share a short version here ] [ 栏目中标注了更多的“行为”，但由于篇幅限制，在这里分享一个简短的版本]

similarly for orange between 04/03/2021 and 10/03/2021. 2021 年 4 月 3 日至 2021 年 3 月 10 日之间的橙色也是如此。

It should not do this calculation for the "act" marker for Apple on 09 as there is no more act for apple after that.它不应该在 09 上为 Apple 的“act”标记做这个计算，因为在那之后苹果没有更多的行为。

Data:数据：

date日期	symbol象征	open打开	high高的	low低的	close关	mark标记
03/03/2021 2021 年 3 月 3 日	APPLE苹果	732 732	754.95 754.95	723.4 723.4	729.85 729.85
04/03/2021 2021 年 4 月 3 日	APPLE苹果	733.25 733.25	765.7 765.7	715.85 715.85	752.45 752.45	act行为
05/03/2021 2021 年 5 月 3 日	APPLE苹果	752.45 752.45	761 761	730.5 730.5	748.95 748.95
08/03/2021 2021 年 8 月 3 日	APPLE苹果	762.7 762.7	767.8 767.8	744.2 744.2	748.4 748.4
09/03/2021 2021 年 9 月 3 日	APPLE苹果	755.55 755.55	759.4 759.4	738.65 738.65	750.75 750.75	act行为
10/03/2021 2021 年 10 月 3 日	APPLE苹果	757.5 757.5	753.1 753.1	743 743	745.35 745.35
12/03/2021 2021 年 12 月 3 日	APPLE苹果	743 743	752.1 752.1	723 723	728.15 728.15
15/03/2021 15/03/2021	APPLE苹果	727.8 727.8	727.8 727.8	706.05 706.05	719.05 719.05
03/03/2021 2021 年 3 月 3 日	ORANGE橙	2406 2406	2417.7 2417.7	2375.8 2375.8	2402.1 2402.1
04/03/2021 2021 年 4 月 3 日	ORANGE橙	2380 2380	2435 2435	2350 2350	2417.1 2417.1	act行为
05/03/2021 2021 年 5 月 3 日	ORANGE橙	2399 2399	2423.9 2423.9	2377.1 2377.1	2387.1 2387.1
08/03/2021 2021 年 8 月 3 日	ORANGE橙	2383 2383	2413.5 2413.5	2360.05 2360.05	2382.7 2382.7
09/03/2021 2021 年 9 月 3 日	ORANGE橙	2400 2400	2444 2444	2396.15 2396.15	2422.7 2422.7
10/03/2021 2021 年 10 月 3 日	ORANGE橙	2446 2446	2446 2446	2415.55 2415.55	2431.95 2431.95	act行为
12/03/2021 2021 年 12 月 3 日	ORANGE橙	2442.8 2442.8	2464.65 2464.65	2397 2397	2401.35 2401.35
15/03/2021 15/03/2021	ORANGE橙	2402.55 2402.55	2427.55 2427.55	2343.05 2343.05	2355 2355

Answer 1

OK, I've taken a crack at this - first I recreated the dataframe:好的，我已经对此有所了解 - 首先我重新创建了 dataframe：

import pandas as pd

data={("03/03/2021","APPLE",732,754.95,723.4,729.85,), 
      ("04/03/2021","APPLE",733.25,765.7,715.85,752.45,"act"), 
      ("05/03/2021","APPLE",752.45,761,730.5,748.95,), 
      ("08/03/2021","APPLE",762.7,767.8,744.2,748.4,), 
      ("09/03/2021","APPLE",755.55,759.4,738.65,750.75,"act"), 
      ("10/03/2021","APPLE",757.5,753.1,743,745.35,), 
      ("12/03/2021","APPLE",743,752.1,723,728.15,), 
      ("15/03/2021","APPLE",727.8,727.8,706.05,719.05,), 
      ("03/03/2021","ORANGE",2406,2417.7,2375.8,2402.1,), 
      ("04/03/2021","ORANGE",2380,2435,2350,2417.1,"act"), 
      ("05/03/2021","ORANGE",2399,2423.9,2377.1,2387.1,), 
      ("08/03/2021","ORANGE",2383,2413.5,2360.05,2382.7,), 
      ("09/03/2021","ORANGE",2400,2444,2396.15,2422.7,), 
      ("10/03/2021","ORANGE",2446,2446,2415.55,2431.95,"act"), 
      ("12/03/2021","ORANGE",2442.8,2464.65,2397,2401.35,), 
      ("15/03/2021","ORANGE",2402.55,2427.55,2343.05,2355,)}

df = pd.DataFrame(data, 
                  columns=("date","symbol","open","high","low","close","mark")).
                  sort_values(by=["symbol", "date"]).fillna("").reset_index(drop=True)

I figure that what you want to do is a simple max on group-by.我认为您想要做的是对 group-by 的简单max 。 The tricky part is manipulating your data so it conforms with what group-by expects.棘手的部分是操纵您的数据，使其符合 group-by 的预期。 That is, a field on which to group.也就是说，要分组的字段。

def block_diff(series, trigger, start_stop=False):
    toggle = False
    rs = list()
    for i,v in series.iteritems():
        if v==trigger:
            if start_stop and toggle:
                rs.append(toggle)
                toggle=not toggle
            elif start_stop and not toggle:
                toggle=not toggle
                rs.append(toggle)
            elif not start_stop:
                toggle=not toggle
                rs.append(toggle)
        else:
            rs.append(toggle)
    return pd.Series(rs)

So the above function is defined - the idea here is that we want to block-out the regions that are going to feature in the group-by.所以上面的 function 被定义了——这里的想法是我们想要屏蔽那些将在 group-by 中出现的区域。 This function accepts a series, some matching trigger value, and a start_stop flag to fine-tune behavior.这个 function 接受一个系列、一些匹配的触发值和一个 start_stop 标志来微调行为。

If I apply that to the dataframe, using the returned True/False values as an index to populate a copy of the grouping variable and store the results in a new field called act_block then I create a unique grouping field that also functions as a start-stop filter.如果我将其应用于 dataframe，使用返回的True/False值作为索引来填充分组变量的副本并将结果存储在一个名为act_block的新字段中，然后我创建一个唯一的分组字段，该字段也用作开始-停止过滤。 At the same time, I also create an additional column called act_sequence which we'll use later to identify the initial starting row for each sub-group.同时，我还创建了一个名为act_sequence的附加列，稍后我们将使用它来标识每个子组的初始起始行。

df['act_block'] = df[block_diff(df['mark'], "act", True)]['symbol']
df['act_sequence'] = df.groupby("act_block").cumcount()
df

    date        symbol  open    high    low     close   mark    act_block   act_sequence
0   03/03/2021  APPLE   732.00  754.95  723.40  729.85          NaN         0
1   04/03/2021  APPLE   733.25  765.70  715.85  752.45  act     APPLE       0
2   05/03/2021  APPLE   752.45  761.00  730.50  748.95          APPLE       1
3   08/03/2021  APPLE   762.70  767.80  744.20  748.40          APPLE       2
4   09/03/2021  APPLE   755.55  759.40  738.65  750.75  act     APPLE       3
5   10/03/2021  APPLE   757.50  753.10  743.00  745.35          NaN         1
6   12/03/2021  APPLE   743.00  752.10  723.00  728.15          NaN         2
7   15/03/2021  APPLE   727.80  727.80  706.05  719.05          NaN         3
8   03/03/2021  ORANGE  2406.00 2417.70 2375.80 2402.10         NaN         4
9   04/03/2021  ORANGE  2380.00 2435.00 2350.00 2417.10 act     ORANGE      0
10  05/03/2021  ORANGE  2399.00 2423.90 2377.10 2387.10         ORANGE      1
11  08/03/2021  ORANGE  2383.00 2413.50 2360.05 2382.70         ORANGE      2
12  09/03/2021  ORANGE  2400.00 2444.00 2396.15 2422.70         ORANGE      3
13  10/03/2021  ORANGE  2446.00 2446.00 2415.55 2431.95 act     ORANGE      4
14  12/03/2021  ORANGE  2442.80 2464.65 2397.00 2401.35         NaN         5
15  15/03/2021  ORANGE  2402.55 2427.55 2343.05 2355.00         NaN         6

Now we can do a simple groupby on act_block , saving the results into a series called max_groups:现在我们可以在act_block上做一个简单的 groupby，将结果保存到一个名为 max_groups 的系列中：

max_groups = df.groupby("act_block")["high"].max()


act_block
APPLE      767.8
ORANGE    2446.0
Name: high, dtype: float64

Take this series and merge it with the original dataframe - if we do this with a filter, the max_vals object will inherit the original dataframe's index, allowing us to do a pd.concat to selectively join the two objects together to produce the intended output. Take this series and merge it with the original dataframe - if we do this with a filter, the max_vals object will inherit the original dataframe's index, allowing us to do a pd.concat to selectively join the two objects together to produce the intended output.

max_vals = df.merge(max_groups, left_on=["act_block"], right_on="act_block",how="left")[(df['act_sequence']==0)].fillna("")['high_y']
max_vals.name="max_val"
new_df = pd.concat([df, max_vals], axis=1).fillna("")
new_df = new_df[["date","symbol","open","high","low","close","mark","max_val"]]

new_df

	date日期	symbol象征	open打开	high高的	low低的	close关	mark标记	max_val max_val
0 0	03/03/2021 2021 年 3 月 3 日	APPLE苹果	732.00 732.00	754.95 754.95	723.40 723.40	729.85 729.85
1 1	04/03/2021 2021 年 4 月 3 日	APPLE苹果	733.25 733.25	765.70 765.70	715.85 715.85	752.45 752.45	act行为	767.8 767.8
2 2	05/03/2021 2021 年 5 月 3 日	APPLE苹果	752.45 752.45	761.00 761.00	730.50 730.50	748.95 748.95
3 3	08/03/2021 2021 年 8 月 3 日	APPLE苹果	762.70 762.70	767.80 767.80	744.20 744.20	748.40 748.40
4 4	09/03/2021 2021 年 9 月 3 日	APPLE苹果	755.55 755.55	759.40 759.40	738.65 738.65	750.75 750.75	act行为
5 5	10/03/2021 2021 年 10 月 3 日	APPLE苹果	757.50 757.50	753.10 753.10	743.00 743.00	745.35 745.35
6 6	12/03/2021 2021 年 12 月 3 日	APPLE苹果	743.00 743.00	752.10 752.10	723.00 723.00	728.15 728.15
7 7	15/03/2021 15/03/2021	APPLE苹果	727.80 727.80	727.80 727.80	706.05 706.05	719.05 719.05
8 8	03/03/2021 2021 年 3 月 3 日	ORANGE橙	2406.00 2406.00	2417.70 2417.70	2375.80 2375.80	2402.10 2402.10
9 9	04/03/2021 2021 年 4 月 3 日	ORANGE橙	2380.00 2380.00	2435.00 2435.00	2350.00 2350.00	2417.10 2417.10	act行为	2446 2446
10 10	05/03/2021 2021 年 5 月 3 日	ORANGE橙	2399.00 2399.00	2423.90 2423.90	2377.10 2377.10	2387.10 2387.10
11 11	08/03/2021 2021 年 8 月 3 日	ORANGE橙	2383.00 2383.00	2413.50 2413.50	2360.05 2360.05	2382.70 2382.70
12 12	09/03/2021 2021 年 9 月 3 日	ORANGE橙	2400.00 2400.00	2444.00 2444.00	2396.15 2396.15	2422.70 2422.70
13 13	10/03/2021 2021 年 10 月 3 日	ORANGE橙	2446.00 2446.00	2446.00 2446.00	2415.55 2415.55	2431.95 2431.95	act行为
14 14	12/03/2021 2021 年 12 月 3 日	ORANGE橙	2442.80 2442.80	2464.65 2464.65	2397.00 2397.00	2401.35 2401.35
15 15	15/03/2021 15/03/2021	ORANGE橙	2402.55 2402.55	2427.55 2427.55	2343.05 2343.05	2355.00 2355.00

如何根据 python 中另一列的条件查找两个日期之间特定列的最大值

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-03-30 10:52:16

如何根据 python 中另一列的条件查找两个日期之间特定列的最大值

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-03-30 10:52:16

解决方案1
0 已采纳 2021-03-30 10:52:16